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Abstract 

The  information  leakage  of  electronic  devices,  especially  those  used  in  cryp¬ 
tographic  or  other  vital  applications,  represents  a  serious  practical  threat  to  secure 
systems.  While  physical  implementation  attacks  have  evolved  rapidly  over  the  last 
decade,  relatively  little  work  has  been  done  to  allow  system  designers  to  effectively 
counter  the  identified  threats.  This  work  addresses  the  technology  gap  between 
identified  problems  and  potential  solutions,  and  significantly  advances  the  study  of 
information  leakage  in  two  primary  areas  of  investigation: 

f.  Radio- Frequency  “Distinct  Native  Attribute”  (RF-DNA)  fingerprinting  of  in¬ 
tegrated  circuits  (ICs)  for  device  authentication,  and 

2.  Leakage  mapping  to  assess  the  information  leakage  of  arbitrary  cryptographic 
implementations . 

First,  the  RF-DNA  fingerprinting  technique  is  used  to  recognize  unique  ICs 
based  on  fabrication  process-induced  variations  in  unintentional  electromagnetic 
(EM)  emissions  in  a  manner  analogous  to  biometric  human  identification.  The  ef¬ 
fectiveness  of  the  technique  is  demonstrated  through  an  extensive  empirical  study, 
indicating  the  technique  scales  well  for  both  identification  and  verification  tasks. 
Empirical  results  are  presented  for  40  near-identical  devices,  with  correct  device 
identification  success  rates  of  greater  than  99.5%,  and  average  verification  equal  er¬ 
ror  rates  (EERs)  of  less  than  0.05%.  Correct  identification  success  rates  exceeding 
90%  were  maintained  under  analysis  conditions  of  SNR  >  15  dB. 

Whereas  all  previously  known  techniques  require  device  hardware  or  software 
modifications,  RF-DNA  fingerprinting  permits  opportunistic  passive  authentication 
using  unintentional  RF  emissions  during  pre-existing  processes  and  protocols.  This 
characteristic  makes  the  approach  suitable  for  security  applications  involving  com- 


modity  commercial  ICs,  with  substantial  cost  and  scalability  advantages  over  previ¬ 
ous  approaches. 

Second,  a  systematic  leakage  mapping  methodology  is  developed  and  demon¬ 
strated  to  comprehensively  assess  the  information  leakage  of  arbitrary  block  cipher 
implementations.  The  proposed  framework  provides  a  comprehensive  approach  to 
assess  the  information  leakage  of  all  algorithmically  specified  key-dependent  interme¬ 
diate  computations  for  implementations  of  symmetric  block  ciphers.  The  resulting 
leakage  assessment  quantitatively  bounds  the  resistance  of  an  implementation  to  the 
general  class  of  differential  side  channel  analysis  techniques,  and  provides  system 
designers  and  evaluators  with  a  tool  to  objectively  assess  whether  countermeasures 
implemented  are  justified  given  the  added  cost  in  time,  space,  and  energy  compared 
to  the  obtained  reduction  in  exploitable  information  leakage.  Furthermore,  the  sys¬ 
tematic  approach  enables  evaluators  to  quickly  and  efficiently  repeat  the  assessment 
process  for  different  variations  of  implementations,  which  helps  to  ensure  the  addi¬ 
tion  of  countermeasures  does  not  inadvertently  introduce  new  unexpected  sources  of 
information  leakage. 

The  leakage  mapping  framework  is  demonstrated  using  the  well-known  Ham¬ 
ming  Weight  and  Hamming  Distance  leakage  models,  with  recommendations  to  ex¬ 
tend  the  technique  using  more  accurate  models.  The  approach  effectiveness  is  demon¬ 
strated  through  empirical  assessment  of  two  typical  unprotected  implementations 
of  the  Advanced  Encryption  Standard,  and  the  assessment  results  are  empirically 
validated  against  correlation-based  differential  power  and  electromagnetic  analysis 
attacks. 


EXPLOITATION  OF  UNINTENTIONAL 


INFORMATION  LEAKAGE  FROM  INTEGRATED  CIRCUITS 


1.  Introduction 

It  is  common  knowledge  that  electronic  equipment  radiates  electromagnetic 
(EM)  energy  that  can  interfere  with  other  nearby  devices.  It  is  for  this  reason 
that  airline  passengers  are  required  to  “turn  off  all  portable  electronic  devices” ,  and 
consumer  electronics  sold  in  the  U.S.  are  required  to  undergo  certification  testing  for 
compliance  with  Federal  Communications  Commission  (FCC)  regulations  |FCC09|. 
Digital  devices  that  incorporate  clocks,  oscillators,  or  other  high  frequency  pulses  are 
specifically  regulated  as  known  unintentional  emitters  because  they  produce  radio 
frequency  (RF)  radiation. 

Over  the  past  decade  there  has  been  a  growing  realization  that  the  content  of 
unintentional  emissions,  in  addition  to  being  a  source  of  interference,  can  also  be 
a  source  of  information  about  the  emission  producing  device’s  internal  state.  This 
realization  has  profound  implications  for  the  physical  security  of  sensitive  electronic 
systems  since  in  many  instances  the  “leaked”  state  information  is  sufficient  to  infer 
precise  details  about  the  operations  the  device  is  performing  and  the  data  it  is 
processing.  Such  details  can  include  extremely  sensitive  private  information  such  as 
cryptographic  key  material.  This  research  studies  the  limits  of  how  much  information 
can  be  gained  by  exploiting  the  unintentional  information  leaked  from  secure  systems. 

1.1  Problem  Addressed 

When  viewed  externally,  all  physical  systems  produce  both  intended  and  un¬ 
intended  outputs.  The  unintended  outputs  are  quantifiable,  physically  observable 
phenomena  produced  as  a  side-effect  of  normal  operation.  When  an  unintended 
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Figure  1.1  Side-Channel  Leakage  from  Physical  Systems. 


observable  outcome  is  correlated  to  some  aspect  of  the  internal  state  or  system  op¬ 
eration,  a  side-channel  is  said  to  leak  information.  The  term  side-channel  was  first 
introduced  by  Kelsey  et  al.  |KSWH00|.  However,  military  and  government  agencies 
have  been  aware  of  information  leakage  due  to  unintentional  emissions  for  decades 
prior  |Boa73,vL85 — as  early  as  1914  by  some  reports  |And01  . 


Definition  1  (Side-Channel)  Any  unintended  physically  observable  time-varying 
phenomenon  that  is  correlated  to  the  internal  state,  operations,  or  data  being  pro¬ 
cessed  by  a  device  of  interest. 


EM  radiation,  including  RF  radiation  and  other  regions  of  the  EM  spectrum, 
is  just  one  of  many  different  side-channels  through  which  electronic  devices  may  leak 
information.  Other  channels  include  variations  in  power  consumption,  variations  in 
computation  time,  and  even  acoustic  and  thermal  emissions.  The  various  known 
sources  of  side-channel  emissions  are  depicted  in  the  block  diagram  of  a  notional 


two-party  communications  system  shown  in  Figure  |1.1|  The  underlying  phenomena 
that  cause  these  emissions  are  described  in  further  detail  in  Section  12.21 


2 


1.2  Security  Implications  of  Information  Leakage 


The  side-channel  information  leakage  from  electronic  devices  is  of  intense  in¬ 
terest  to  security  researchers  since  it  presents  attack  vectors  through  which  unau¬ 
thorized  parties  may  gain  access  to  information  that  is  intended  to  be  kept  private. 
Information  leakage  may  inadvertently  reveal  the  contents  of  protected  data  such  as 
passwords,  cryptographic  keys,  or  PIN  numbers.  Further,  it  may  allow  an  adversary 
to  reverse  engineer  critical  aspects  of  intellectual  property  or  critical  technologies 
such  as  proprietary  algorithms,  circuit  designs,  or  protocols.  Therefore,  unintended 
leakage  of  information  poses  a  serious  practical  security  threat  to  electronic  systems — 
particularly  cryptographic  systems — when  naive  designers  assume  a  closed,  secure 
environment  where  only  the  intended  outputs  are  visible  | ZF05 1 . 


Over  the  past  decade,  a  significant  body  of  research  has  been  dedicated  to 
applying  side-channel  analysis  (SCA)  as  an  attack  mechanism  to  defeat  the  security 
of  cryptographic  devices.  Typically,  the  earlier  research  has  focused  on  the  vulnera¬ 
bility  of  systems  to  key  recovery  attacks — i.e.,  the  extraction  of  private  or  secret  key 
material  from  a  cryptographic  system.  The  existence  of  SCA  techniques  means  the 
security  of  sensitive  information  that  relies  on  the  secrecy  of  a  cryptographic  key  as 
a  primary  mode  of  protection  can  be  greatly  reduced  in  practice  if  access  to  side- 
channel  emissions  is  realistically  possible.  The  type  of  information  that  is  typically 
protected  varies  widely,  but  includes  things  such  as  the  content  of  secure  commu¬ 
nications,  software  or  firmware  code  which  reveals  critical  technology  or  intellectual 
property  details,  and  tokens  permitting  access  to  secure  networks  or  systems. 


The  protection  of  sensitive  proprietary  algorithms  or  techniques  (i.e.,  critical 
technology  or  intellectual  property )  is  of  particular  importance  in  both  military  and 
commercial  applications.  Reverse  engineering  poses  a  serious  threat  since  it  can  en¬ 
able  competitors  or  adversaries  to  bypass  years  of  research  and  development  through 
counterfeiting  or  theft  of  intellectual  property.  Commercially,  the  total  global  loss  in 
revenue  due  to  counterfeiting  and  pirated  products  is  predicted  to  reach  $1.7  trillion 


3 


by  2015  |Froll|.  This  estimate  does  not  account  for  intangible  losses  such  as  brand 
deterioration.  Likewise,  in  military  applications,  compromise  of  critical  technology 
degrades  weapon  system  combat  effectiveness  and  useful  life  expectancy. 


Whereas  ‘traditional’  techniques  for  reverse  engineering  of  integrated  circuits 
(ICs)  still  require  very  expensive,  specialized  equipment  (e.g.,  scanning  electron  mi¬ 
croscopes)  [TJ09],  side-channel  attacks  can  be  performed  using  widely  available  (and 
relatively  inexpensive)  commercial  tools.  Commercial  SCA  systems  are  available 
for  sale  on  the  open  market,  to  include  the  Riscure  Inspector  system  used  in  this 
research  |Ris09|,  and  source  code  for  carrying  out  a  variety  of  attacks  is  readily 


available  on  the  Internet  ParlO,OTll  .  Since  it  is  increasingly  common  for  critical 
technologies  or  intellectual  property  to  be  in  software  or  firmware  protected  by  a 
cryptographic  system,  the  existence  of  SCA  key  extraction  techniques  must  be  a 
consideration  when  assessing  the  security  level  of  any  such  systems. 


1.3  Research  Objective 

The  overall  research  objective  is  to  investigate  various  aspects  of  the  infor¬ 
mation  leakage  of  ICs.  The  study  of  information  leakage  from  electronic  devices, 
particularly  ICs,  is  a  relatively  new  area  of  investigation  and  researchers  are  only 
beginning  to  formalize  the  problem  and  develop  rigorous  methods  for  assessing  the 
vulnerability  of  physical  devices  to  SCA  techniques.  Most  research  to  date  has  fo¬ 
cused  on  the  exploitation  of  cryptographic  systems,  with  a  particular  emphasis  on 
key  recovery  attacks.  Thus  far,  very  little  work  has  been  done  that  enables  system 
designers  to  effectively  counter  the  threat  of  side-channel  attacks  or  to  investigate 
alternative,  constructive  uses  of  the  phenomena.  Investigating  alternative  aspects  of 
information  leakage  is  the  focus  of  this  work,  and  includes  two  main  thrusts. 

First,  unintentional  emissions  are  investigated  as  a  source  of  information  to 
recognize  or  verify  the  identity  of  a  unique  IC.  The  problem  of  IC  authentication  has 
numerous  practical  applications,  including  providing  enhanced  security  for  secure 


4 


access  mechanisms  (e.g.,  anti-cloning),  detection  of  unauthorized  modifications  to 
circuit  designs  (e.g.,  hardware  Trojan  detection),  or  forensic  attribution  of  electronic 
evidence  in  criminal  or  other  cases.  Whereas  all  previously  known  authentication 
techniques  require  either  device  hardware  or  operational  software  modifications,  un¬ 
intentional  emissions  can  be  passively  analyzed.  Thus,  an  effective  authentication 
approach  based  on  the  unintentional  emissions  of  a  device  would  result  in  a  more 
cost-effective  and  scalable  approach  to  the  problem  than  existing  solutions. 

The  second  thrust  is  to  investigate  techniques  that  enable  system  designers  to 
effectively  and  systematically  assess  the  vulnerability  of  a  particular  cryptographic 
implementation  to  known  side-channel  attack  techniques.  From  a  cryptographic  sys¬ 
tem  designer  or  engineer’s  perspective,  designing  a  system  that  is  secure  against  the 
plethora  of  rapidly  evolving  physical  implementation  attacks  is  daunting.  While  new 
or  enhanced  attack  techniques  continue  to  be  published  at  a  rapid  pace,  very  little 
work  has  been  done  to  aid  system  designers  in  practically  addressing  the  resulting 
security  risks.  This  research  investigates  the  development  of  a  systematic  leakage 
mapping  framework  to  guide  system  designers  in  making  sound  decisions  during 
the  development  process  to  obtain,  with  some  degree  of  certainty,  a  desired  level  of 
resistance  to  side-channel  attacks. 


1.4  Dissertation  Organization 

This  document  is  divided  into  five  chapters.  Chapter  [2]  provides  an  historical 
overview  of  the  relevant  work  done  to  date,  and  introduces  fundamental  concepts 
necessary  for  the  study  of  side-channel  emissions  and  information  leakage.  Additional 
relevant,  but  not  critical,  background  information  is  included  as  appendices. 


The  main  document  body  is  composed  of  two  chapters  containing  scholarly  ar¬ 
ticles  prepared  during  this  research:  Intrinsic  Physical  Layer  Authentication  of  Inte¬ 
grated  Circuits  (Chapter  [3j)  [CLB+11  and  Leakage  Mapping:  A  Systematic  Method¬ 
ology  for  Assessing  the  Side-Channel  Information  Leakage  of  Cryptographic  Imple- 
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mentations  (Chapter  |IJ).  Each  chapter  contains  the  aricle  text,  including  relevant 
background  and  methodology,  as  submitted  for  publication  with  some  editing  as  re¬ 
quired  for  incorporation  into  the  format  of  this  document.  Because  the  submitted 
versions  of  each  article  were  constrained  by  mandatory  page  limitations,  some  addi¬ 
tional  supplementary  explanation  is  provided  at  the  end  of  each  chapter  to  expand 
on  the  methodology  and  results  in  more  detail  where  appropriate. 

Chapter  [5]  concludes  the  main  document  and  summarizes  the  key  findings  of 
this  work,  and  provides  several  recommendations  for  future  research. 

Finally,  selected  source  code  developed  during  this  research  is  included  in  the 
appendices.  A  full  archive  of  the  source  code  and  data  sets  used  to  produce  the 
results  are  available  separately  on  electronic  media. 
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2.  Background 


2. 1  Overview 

This  chapter  introduces  the  key  concepts  and  techniques  investigated  in  this  re¬ 
search.  Side  channel  information  leakage  by  electronic  systems  is  a  multi-disciplinary 
field  that  synthesizes  a  variety  of  sub-domains  of  knowledge  to  include: 


•  Cryptography, 

•  Computer  architecture  and  engineering, 

•  Microelectronic  devices, 

•  Digital  and  VLSI  systems  design, 

•  Electromagnetic  theory, 

•  Probability  theory, 

•  Signal  processing  and  pattern  recognition, 

•  Communications  and  information  theory,  and 

•  Software  design. 


Where  required,  the  relevant  aspects  of  each  field  are  covered  to  describe  needed 
concepts  as  they  are  introduced.  The  remainder  of  this  chapter  is  laid  out  as  follows. 
Section  |2.2|  describes  the  principle  sources  of  side  channel  information  leakage  and 
their  underlying  causes.  Section  [T3|  describes  the  main  applications  of  side  channel 
analysis  to  date,  with  an  emphasis  on  cryptanalytic  key  recovery  attacks.  This 
section  also  describes  the  basic  aspects  of  cryptographic  systems  which  make  SCA 
analysis  techniques  feasible  in  practice.  Section  |2.5|  reviews  the  primary  classes  of 
cryptanalytic  SCA  techniques,  including  so-called  simple ,  differential,  and  profiling 
techniques.  Section  |2.6|  describes  a  variety  of  proposed  countermeasures  to  improve 


the  resistance  of  physical  systems  to  SCA  techniques.  Finally,  Section  [277]  discusses 
important  practical  aspects  of  SCA  attacks. 
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2.2  Emission  Sources 


Electronic  devices  create  a  variety  of  side  channels  during  the  normal  course  of 
operation.  Each  channel  can  leak  information  when  the  characteristics  of  the  physi¬ 
cally  observable  behavior  are  correlated  to  the  electronic  circuit’s  internal  operation 
or  state.  This  section  reviews  the  common  side  channels  of  electronic  devices — 


including  timing,  power,  electromagnetic  and  acoustic  (cf.  Fig.  1.1). 


2.2.1  Variations  in  Computation  Time.  The  time  for  a  microprocessor  or 
other  electronic  device  to  complete  tasks  is,  in  general,  variable  and  depends  on  the 
data  processed  |Koc96|.  The  reasons  for  variations  in  processing  time  are  unique  to 
a  particular  implementation,  but  may  include  dependencies  on  conditional  branches 
or  other  conditional  execution  mechanisms,  cache  misses,  pipeline  stalls,  waiting  on 
queries  to  external  devices  such  as  RAM,  and  so  on  |Koc96|.  For  combinational  logic 
devices,  the  time  required  for  an  output  to  reach  a  steady,  glitch-free  state  may  vary 
depending  on  the  inputs,  circuit  layout,  or  other  factors  |WH05| . 

A  notional  example  of  a  simple  algorithm  with  data-dependent  computation 
time  in  most  implementations  is  shown  in  Algorithm  [1} 


Algorithm  1  Data-Dependence  Example 
Require:  Integers  x  and  y,  Control  bit  b. 

1:  if  b  =  1  then  return  xy 
2:  elsereturn  x  +  y 
3:  end  if 


The  value,  b,  is  some  control  bit  and  x  and  y  are  operands.  If  the  control  bit 
is  a  T’  an  exponentiation  is  performed;  if  not,  the  operands  are  added.  Assuming 
an  exponentiation  operation  is  much  more  costly  in  terms  of  execution  time  than  an 
addition  (which  it  would  be  in  any  typical  implementation)  the  algorithm  will  take 
longer  when  b  is  T’  vs.  ‘O’.  Thus,  an  outside  observer  can  infer  the  value  of  bit 
b  from  the  amount  of  time  the  operation  takes  to  complete.  The  implementation, 
therefore,  is  said  to  leak  the  value  of  the  data  bit  b  through  a  timing  side  channel. 


Kocher  discovered  that  such  unintentional  variations  in  execution  time  can  be 
significantly  correlated  to  the  data  processed  by  a  device  |Koc96|.  In  many  cases 
the  correlation  of  the  unintentional  variations  leak  sufficient  information  to  recover 
an  entire  key  from  a  cryptographic  device.  The  cryptanalytic  applications  of  timing 
and  other  SCA  techniques  are  discussed  in  Section  |2.3.1| 


2.2.2  Variations  in  Power  Consumption.  Most  modern  integrated  circuits, 
including  general  purpose  microprocessors,  are  based  on  complementary  metal  oxide 
semiconductor  (CMOS)  transistor  technology^] 

The  power  consumed  by  CMOS  devices  has  both  static  and  dynamic  compo¬ 
nents.  Since  the  static  component  is  nearly  constant,  it  can  be  neglected  for  the 
purposes  of  SCA  since  it  is  not  data  or  operation  dependent  and  therefore  does  not 
leak  information  about  the  internal  system  state.  Dynamic  power  consumption,  on 
the  other  hand,  is  a  function  of  internal  switching  activity  of  individual  transistors. 
Since  the  switching  activity  depends  on  the  operations  performed  and  data  manip¬ 
ulated,  the  resulting  variations  in  dynamic  power  consumption  are  a  source  of  side 


channel  information  leakage  MOP07 
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Figure  2.1  Dynamic  power  dissipation  of  the  CMOS  inverter  |Bak07 


The  total  current  drawn  from  a  constant  voltage  power  supply  at  any  point  in 
time  is  the  sum  of  the  current  drawn  by  all  the  individual  logic  cells.  There  are  two 


1  Strictly  speaking,  modern  transistor  technology  is  no  longer  necessarily  metal-  or  oxide-based 
since  those  material  layers  can  be  replaced  by  alternative  materials.  However,  the  term  CMOS  is 
overwhelmingly  used  in  the  literature  and  general  practice  to  refer  to  both  true  CMOS  and  other 


technologies  with  similar  behavior  WH05  .  The  differences  are  irrelevant  to  this  work,  and  the 
term  is  used  to  generically  refer  to  all  CMOS-like  devices. 
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primary  sources  of  dynamic  power  consumption  in  a  CMOS  circuit.  The  first  is  the 
current  drawn  and  dissipated  when  individual  transistors  are  switched  on  and  off 
respectively.  Fig.  o  illustrates  the  concept  using  a  simple  CMOS  inverter.  When 
the  input  transitions  from  1  to  0,  Ml  is  switched  off  and  M2  is  switched  on.  The 
cell  draws  a  charging  current  from  a  constant  voltage  power  supply  to  charge  the 
intrinsic  and  extrinsic  capacitances  (Ctot)-  When  the  input  transitions  from  0  to 
1,  Ml  is  switched  on  and  M2  is  switched  off,  and  the  stored  energy  is  discharged 
through  the  ground  line.  When  the  input  remains  the  same,  the  transistors’  states 
do  not  change  and  there  is  no  dynamic  current  flow. 

The  second  primary  source  of  dynamic  power  consumption  in  CMOS  logic  is 
the  short  circuit  path  between  the  power  rail  and  ground  created  during  the  short 
period  of  time  during  each  transition  when  both  the  pMOS  and  nMOS  networks 
are  partially  on.  This  short-circuit  current  contributes  significantly  to  the  dynamic 
power  consumption  of  a  circuit. 


Combinational  CMOS  circuits  also  experience  transient  signal  behavior,  known 
as  glitches  or  dynamic  hazards,  as  inputs  propagate  and  arrive  at  different  times 
through  the  circuit.  In  large  or  complex  combinational  circuits,  dynamic  power 
consumption  is  influenced  to  a  large  degree  by  the  glitches  encountered  before  the 
circuit  settles  into  its  intended  steady-state  output — to  the  point  that  glitches  may 


actually  become  the  dominant  source  of  dynamic  power  consumption  MOP07 


For  microprocessor  devices,  power  consumption  is  affected  by  the  instruction 
being  executed,  the  content  of  data  being  manipulated,  and  the  address  or  location 
of  memory  or  data  registers  being  accessed  |QS02|.  For  pipelined  architectures,  there 
may  be  two  or  more  instructions  in  the  pipeline  during  a  clock  cycle,  all  of  which 
contribute  to  power  consumption  during  that  cycle  |QS02  . 


The  statistical  relationship  between  the  activity  of  a  circuit  and  its  total  power 
consumption  can  be  used  to  infer  information  about  the  data  being  processed — 
including  sensitive  information  such  as  cryptographic  keys  used  in  a  ciphering  opera- 
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tions  |KJJ99  .  Kocher  introduced  two  key  recovery  attacks  based  on  this  observation — 
simple  power  analysis  (SPA)  and  differential  power  analysis  (DPA).  Both  passively 
extract  the  entire  secret  key  from  implementations  of  the  Data  Encryption  Standard 
(DES)  cryptographic  algorithm.  Kocher’s  SPA  and  DPA  techniques  are  described 
in  Sections  12.5.21  and  12.5.31 


2.2.3  Electromagnetic  Emissions.  The  electromagnetic  emissions  (both  ra¬ 
diated  and  conducted)  produced  by  electronic  devices  during  normal  operations  are 
physically  caused  by  the  time- varying  current  drawn  by  the  circuit  as  the  transistors 
switch  on  and  off.  As  current  flow  varies,  it  induces  many  tiny  time- varying  electro¬ 
magnetic  fields.  These  fields  combine  through  complex  interactions  and  propagate 
as  time-varying  EM  waves  via  both  radiation  and  conduction  through  the  power 
supply  and  ground  lines  and  other  conductive  materials  |Boa73,  AARR02,  AARR07|. 
The  fundamental  nature  of  these  effects  is  well  understood  as  described  by  Maxwell’s 
equations 


AARR02 ,  AARR07  The  idea  that  EM  emanations  leak  information  has 


been  known  for  several  decades,  although  it  has  only  recently  become  the  focus  of 


significant  academic  research  Boa73,And01 


Most  early  EM  emissions  research  was  related  to  the  eavesdropping  risks  of 


peripheral  devices  such  as  video  display  units  (see  Section  2.3.1.1).  Van  Eck  and 


Laborato’s  (1985)  eavesdropping  attack,  described  in  Section  2. 3. 1.1,  was  the  first 
known  research  on  the  topic  |vL85  .  The  first  published  cryptanalytic  applications 
were  |QS02,GMO01  ,  both  of  which  were  extensions  of  Kocher’s  timing  and  power 
attacks @ 

Quisquater  and  Samyde  were  the  first  to  extend  Kocher’s  power  and  timing 
attacks  to  the  EM  side  channel  |QS00,QS01|.  By  observing  the  EM  side  channel 
data  they  achieved  more  precise  measurements  of  circuit  activity  than  would  be 
possible  with  power  analysis  short  of  physically  modifying  the  circuit.  Achieving 


2The  authors  of  QSOO  refer  to  their  EM  variations  of  Kocher’s  attacks  as  Simple  EM  Analysis 
(SEMA)  and  Differential  EM  Analysis  (DEMA). 


11 


the  same  precision  of  measurements  with  power  analysis  would  require  an  invasive 
attack  to  bypass  the  filtering  effects  of  the  power  distribution  network  and  peripheral 
components. 


In  2001,  Gandolfi  et  al.  published  the  first  concrete  examples  of  EM  crypt¬ 
analysis  by  successfully  extracting  keys  from  CMOS  hardware  implementations  of 


DES,  COMP128,  and  RSA  GMOOl  .  Gandolfi’s  results  reinforced  Quisquater  and 
Samyde’s  claim  that  EM  emanations  allow  more  efficient  DSCA  attacks,  resulting 
in  successful  key  recovery  with  fewer  observations. 

Agrawal  et  al.  more  recently  published  and  updated  an  extensive  study  of 


EM  information  leakage  AARR02 ,  AARR07  .  One  of  their  key  findings  was  that 
single  wide-band  EM  sensors  capture  many  unique  information  leakage  signals ,  each 
of  which  carry  distinct  information.  They  also  noted  that  a  substantial  source  of  in¬ 
formation  leakage  is  the  very  weak  amplitude-modulated  EM  information  conducted 
by  surfaces  or  fines  attached  to  the  device.  The  conducted  EM  leakage  propagates 
through  attached  or  nearby  conductive  surfaces  to  include  the  power  and  ground 
lines,  implying  that  side  channel  data  collected  for  power  analysis  attacks  is  likely  to 
contain  weaker  and  higher  frequency  EM  leakage  signals  due  to  unintentional  power 
line  modulation. 

EM  emanations  caused  by  a  circuit’s  operation  fall  into  two  primary  categories — 
direct  and  indirect  emanations  |AARR07|.  Direct  emanations  are  created  by  current 
flow  through  a  circuit’s  intended  current  path  and  the  resulting  switching  activity  in 
the  circuit’s  transistors  or  other  devices.  Indirect  emanations  are  created  when  small 
couplings  between  densely  packed  electronic  components  modulate  existing  carrier 
signals  emitted  by  the  device,  and  can  modulate  both  unintentional  and  intentional 
EM  signals  transmitted  by  a  device.  Both  types  of  emanations  can  lead  to  side 
channel  information  leakage. 

Through  the  analysis  of  leaked  EM  information,  Agrawal  et  al.  claimed  suc¬ 
cessful  attacks  against  various  cryptographic  implementations.  Additionally,  they 
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posit  the  possibility  of  more  powerful  attacks  using  multiple  complementary  electro¬ 
magnetic  sensors  with  different  characteristics  to  simultaneously  collect  side  channel 
information  leakage  from  multiple  signals. 


Agrawal’s  most  interesting  experimental  result  is  that  strong  information  bear¬ 
ing  signals  (particularly  carriers  at  multiples  of  the  system’s  clock  frequency)  prop¬ 
agate  well  beyond  the  near-field — beyond  50  feet  for  an  unprotected  commercial 
secure  socket  layer  (SSL)  accelerator  card  operating  inside  an  unmodified,  closed 
computer  server  |AARR02,  AARR07,  Roh06|.  Practical  template  attacks  on  such 
a  device  with  no  countermeasures  were  possible  at  a  distance  of  approximately  15 


feet  AARR02 ,  AARR07  .  Mangard  reported  similar  results  and  demonstrated  ex¬ 
traction  of  a  cryptographic  key  from  a  smart  card  at  a  distance  of  more  than  5  m  in 
an  unshielded  environment  with  an  unoptimized  measurement  setup  |Man03|.  The 
threat  from  these  types  of  attacks  may  actually  be  greater  because  an  adversary 
could  potentially  place  an  inconspicuous  receiver  close  to  the  target  device  (within 
5  ft.),  which  could  relay  the  side  channel  data  of  interest  to  a  remote  processing 
station  |Roh06|.  The  signal  strength  at  that  distance  may  be  sufficient  to  enable 
single-observation  template  attacks  and  advanced  signal  processing  techniques  capa¬ 
ble  of  defeating  some  SC  A  countermeasures. 

A  potentially  even  more  powerful  technique  was  published  by  Burnside,  et  al. 
on  the  illumination  of  an  integrated  circuit  by  an  external  carrier  |BEA08|.  The 
targeted  integrated  circuit  is  illuminated  with  a  low-power  external  RF  source,  and 
the  data-dependent  behavior  is  modulated  on  the  reflected  external  signal  which  is 
then  captured.  The  authors  speculate  that  this  technique  may  result  in  effective 
side  channel  attacks  at  greater  distances  than  those  based  on  the  native  emissions 
produced  by  the  target  device,  but  no  experimental  results  were  provided  to  support 
that  hypothesis. 
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2.2.4  Acoustic  Emissions.  The  acoustic  emissions  of  devices  are  also  a 
source  of  side-channel  information  leakage.  Recently  declassified  lectures  on  the 
History  of  Communications  Security  reveals  that  intelligence  agencies  were  concerned 
with  acoustic  information  leakage  from  mechanical  cryptographic  devices  as  early  as 
the  1940’s — even  over  noisy  telephone  lines  when  a  phone  receiver  was  operated  in 
the  same  vicinity  as  the  sensitive  equipment  |Boa  73]. 


Various  peripheral  electronic  devices  also  reportedly  leak  information.  For 
example,  the  sound  of  keyboard  (or  other  keypad)  presses  is  correlated  with  what  a 


user  is  typing  AA04,  ZZT05  .  However,  few  other  peer-reviewed  publications  exist 
on  the  subject.  Recently,  a  similar  attack  was  demonstrated  using  the  sound  of  a 


dot  matrix  printer  to  identify  the  characters  printed  Gib09  .  Dot  matrix  printers 
are  still  used  widely  in  financial  and  medical  applications  which  makes  the  existence 
of  such  a  vulnerability  a  relevant  practical  concern. 


Most  surprising,  however,  is  that  electronic  devices  such  as  general  purpose 
microprocessors  actually  leak  information  through  their  acoustic  emanations.  Shamir 
and  Tromer  have  performed  some  initial,  unpublished,  experiments  indicating  that 
the  acoustic  emissions  of  commercial  microprocessors  are  sufficient  to  determine  with 
good  precision  when  specific  operations  begin  and  end.  Further  analysis  could  lead 


to  effective  timing  attacks  on  cryptographic  devices  ST04 


2. 3  Applications 

The  vast  majority  of  side  channel  research  to  date  deals  with  the  security  vul¬ 
nerabilities  introduced  by  unintentional  information  leakage.  Side  channel  vulnera¬ 
bilities  range  from  simple  eavesdropping  risks  of  peripherals,  such  as  video  displays 
or  keyboards,  to  the  inadvertent  disclosure  of  the  cryptographic  keys  critical  to  the 
security  of  entire  systems. 

Beyond  the  various  cryptanalytic  attack  modes,  side  channel  analysis  has  also 
been  investigated  for  several  other  applications  including  reverse  engineering,  hard- 
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ware  covert  channel  circuitry,  and  the  detection  of  unauthorized  modifications  or 
additions  to  circuits  (Trojans)  during  outsourced  chip  fabrication. 

This  section  provides  an  overview  of  the  different  applications  of  side  channel 
analysis  investigated  over  the  past  three  decades  of  research  in  this  area. 

2.3.1  SCA  Attacks.  The  term  SC  A  attack  (often  just  SCA  in  the  litera- 
tmfl  is  commonly  used  to  describe  any  use  of  side  channel  information  to  either: 


1.  Bypass  or  compromise  the  security  mechanisms  of  a  system,  or 

2.  Infer  information  about  the  internal  state,  data  or  operations  of  a  device  that 
is  not  intended  to  be  accessible. 


SCA  attacks  are  a  subset  of  the  more  general  class  of  implementation  at¬ 
tacks  which  exploit  vulnerabilities  in  a  device’s  physical  implementation  rather  than 
attacking,  for  instance,  the  mathematical  strength  of  a  cryptographic  algorithm. 
Other  common  implementation  attacks  are  fault  attacks  and  physical  tampering 


techniques  ZF05  .  SCA  attacks  differ  from  many  other  implementation  attacks  in 
that  they  are  generally  passive — meaning  they  can  be  implemented  without  revealing 
the  attacker’s  presence  and  without  damaging  or  otherwise  physically  affecting  the 
system  of  interest  AndOl].  This  is  in  contrast  to  many  other  known  implementation 
attacks  that  are  either  active — risking  disclosure  of  the  attacker’s  presence  or  intent, 
or  invasive — risking  damage  or  triggering  of  a  circuit’s  tamper-resistant  features. 


The  two  most  prevalent  SCA  attacks — eavesdropping  and  key  recovery  attacks — 
are  described  below. 


2. 3. 1.1  Eavesdropping.  Eavesdropping  attacks  on  electronic  devices 
are  analogous  to  eavesdropping  on  human  conversations.  Both  involve  an  adversary, 


3The  acronym  SCA  is  used  interchangeably  throughout  the  literature  to  refer  to  both  side 
channel  analysis  and  side  channel  attacks.  For  clarity,  the  term  SCA  is  used  to  describe  the  more 
general  side  channel  analysis.  Side  channel  attacks  are  referred  to  as  SCA  attacks. 
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Eve,  secretly  listening  to  what  is  intended  to  be  a  private  conversation  between  an 
originator  (Alice)  and  a  recipient  (Bob).  A  typical  traditional  eavesdropping  attack 
might  involve  Eve  tapping  into  video  display  cabling  for  a  computer  monitor  or 
television  display  to  see  the  same  picture  Bob  is  seeing. 

SCA  eavesdropping  attacks  differ  from  classical  eavesdropping  in  that  they  do 
not  target  the  primary  or  intended  communications  channel.  For  example,  returning 
to  the  scenario  of  a  video  display  signal,  in  the  SCA  eavesdropping  attack  Eve 
would  not  directly  tap  into  the  primary  video  display  signal.  Rather,  she  would 
attempt  to  reconstruct  the  video  signal  from  an  alternate,  unintended  side  channel 
of  information  such  as  EM  emanations  produced  by  the  system’s  cabling.  Eve  may 
prefer  a  side  channel  attack  to  the  direct  eavesdropping  approach  for  a  number  of 
reasons  such  as  restricted  access  to  the  system  or  tamper-resistant  cabling. 


The  first  published  eavesdropping  attack  to  exploit  unintentional  emissions 
was  by  Wim  van  Eck  |vL85j.  Cathode  ray  tube  (CRT)  video  terminals  produce 
electromagnetic  emissions  that  can  be  remotely  intercepted  and  reconstructed  by  an 
eavesdropper  at  a  substantial  distance — even  through  walls.  The  attack  is  carried 
out  using  inexpensive  commercially  available  receiver  technology  to  passively  view 
a  real-time  reproduction  of  what  is  being  displayed  on  a  remote  target  computer. 
More  recently,  van  Eck’s  attack  has  been  extended  to  show  it  is  still  relevant  to 
modern-day  flat  panel  technology  |Ku  h04  ] . 


Eavesdropping  SCA  attacks  typically  target  peripherals  or  human  interface 
devices.  Attacks  have  been  published  that  target  the  electromagnetic  emanations  of 
computer  displays  and  keyboards  VP09],  the  flashing  of  light-emitting  diode  (LED) 


activity  indicators  on  computer  peripherals  such  as  modems  |LU02  ,  and  the  acoustic 


activity  of  keyboards  and  printers  AA04,ZZT05  ,  among  others. 


Eavesdropping  attacks  are  not  the  principal  focus  of  this  research,  and  the 
details  of  these  attacks  are  not  covered  further  here.  Additional  details  can  be  found 
vL85,Kuh04]VP09|. 


m 
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2.3. 1.2  Cryptanalysis  and  Key  Recovery  Attacks.  The  term  side 
channel  cryptanalysis  refers  to  the  practical  application  of  SCA  to  break  or  bypass 
cryptographic  implementations — with  a  typical  objective  being  the  recovery  of  a  se¬ 


cret  key  from  a  device  |MOP07,  ZF05,  LR09|.  Whereas  SCA  eavesdropping  attacks 
target  side  channel  information  leakage  that  reveals  the  same  information  as  an 
intended  input  or  output,  side  channel  cryptanalysis  attacks  attempt  to  gain  knowl¬ 
edge  of  the  internal  state  of  a  device  or  the  data  it  is  processing  by  analyzing  the 
information  leakage  from  the  implementation. 


SCA  cryptanalysis  relies  on  the  fact  that  cryptographic  systems  are  not  mani¬ 
fested  in  physical  form  as  a  pure  mathematical  function.  Mathematically,  a  encryp¬ 
tion  operation  is  a  function  Ek[P ]  C  where  K  is  the  key,  P  is  the  plain-text  or 
data  being  encrypted,  and  C  is  the  intended  output  or  cipher-text  |Sch96|.  How¬ 
ever,  in  reality,  a  physical  system  implements  the  encryption  algorithm  as  Ek[P\  K > 
(C,  Si,  S‘2,  ■■■ ,  Sn)  where  Si  is  any  additional  information  unintentionally  leaked  by 
the  system  through  one  of  n  side  channels. 


For  electronic  systems,  a  fundamental  reason  that  information  leakage  occurs 
is  that  circuits  must  perform  a  variety  of  intermediate  steps  to  produce  a  desired 
final  output  value  from  a  particular  input.  When  viewed  at  the  transistor  level,  even 
fundamental  logic  cells  such  as  an  XOR  gate  produce  intermediate  signals  in  the 
propagation  path  to  the  output. 

In  general,  systems  and  components  are  designed  in  such  a  manner  that  only 
the  final  output  is  intended  to  be  externally  visible.  However,  although  the  results  of 
each  intermediate  computation  are  not  normally  observable,  some  information  about 
their  activity  and  content  is  leaked  through  the  side  channels  they  produce. 


Definition  2  (Intermediate  value)  Any  non-final  result  produced  during  the  in¬ 
termediate  steps  of  a  computation. 
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Consider  a  system  based  on  Algorithm  [2|  below.  This  algorithm  can  be  viewed 
as  a  simple,  albeit  very  weak  cipher  that  encrypts  data  by  adding  it  to  the  Hamming 
weight  of  the  key,  K.  The  system’s  security  depends  on  keeping  the  contents  of  the 
key,  K,  a  secret  from  outside  observers  or  malicious  agents.  Assume  K  is  stored  in 
some  internal  memory  location  inaccessible  to  outside  observers. 


Algorithm  2  Example  of  Intermediate  Value  Leakage 

Require:  Input  X 
1:T<-X 

2:  for  i  =  0  to  31  do 
3:  if  Ki  =  1  then 

4:  T  <—T+l 

5:  end  if 

6:  end  forreturn  T 


There  are  at  least  three  potential  sources  of  information  leakage  in  this  algo¬ 
rithm: 


1.  The  comparison  operation  at  line  3  directly  accesses  a  key  bit  during  each  loop 
iteration. 


2.  The  algorithm’s  execution  path  during  each  loop  iteration  is  conditional  on  a 
key  bit. 

3.  The  intermediate  value  T  is  manipulated  at  line  4  whenever  the  key  bit  is  a 
‘lb 


Due  to  a  variety  of  phenomena  described  in  Sections  |2.2|  and  |2.5[  each  of  these 
actions  may  cause  variations  in  the  physically  observable  characteristics  of  the  device 
that  are  statistically  related  to  the  operation  or  data  being  manipulated.  Analysis 
of  this  side  channel  data  may  reveal  Kt  directly,  or  more  subtly  the  value  or  change 
in  state  of  T  which  indirectly  reveals  one  bit  of  K.  As  a  result,  the  secret  key  K 
may  be  revealed — compromising  the  security  of  the  system  on  which  it  depends. 


For  cryptographic  or  other  secure  systems,  the  implications  are  far-reaching. 
Since  almost  all  security  mechanisms  in  electronic  and  computer  systems  rely  on 
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preserving  the  secrecy  of  some  sensitive  information,  any  attack  that  can  bypass  those 
mechanisms  is  of  great  concern.  Whereas  many  traditional  cryptanalytic  attacks  are 
of  theoretic  interest  but  are  not  practical  threats,  recent  research  has  shown  side 
channel  leakage  to  be  of  great  practical  concern. 


In  1996,  Kocher  proposed  the  first  known  attack  capable  of  using  side  chan¬ 
nel  leakage  (timing  information)  to  extract  an  entire  key  from  implementations  of 
several  cryptographic  algorithms  |Koc96|.  Since  the  publication  of  Kocher’s  orig¬ 
inal  paper,  numerous  new  cryptanalytic  side  channel  attacks  and  improvements 
have  been  published  targeting  a  wide  range  of  cryptographic  algorithms  and  pro¬ 
tocols.  Successful  practical  attacks  have  been  mounted  against  hardware  and  soft¬ 
ware  implementations  of  block,  stream,  and  public-private  type  algorithms  includ¬ 
ing  DES,  AES,  Rivest-Shamir-Adleman  (RSA),  and  various  elliptic  curve  schemes 


Koc96,KJJ99,Roh06,  AndOl  .  Mangard,  et  al.  published  an  entire  text  dedicated  to 


SCA  attacks  of  AES  implementations  |MOP07|.  Brumley  and  Boneh  demonstrated 
a  practical  remote  attack  against  an  OpenSSL-based  Webserver  across  a  local  area 


network  BB05 


Recently,  a  complete  practical  attack  on  the  KeeLoq  encryption  scheme  has 
been  published  capable  of  extracting  the  master  key  from  a  device  in  a  single  observa¬ 
tion  |KKMP09|.  KeeLoq  is  a  ubiquitous  remote  keyless  entry  protection  algorithm 
used  in  a  wide  variety  of  commercial  keyless  entry  systems  for  vehicles  and  garage 
door  openers.  The  publication  of  a  simple  and  efficient  attack  against  such  a  widely 
used  system  amplifies  the  practical  threat  of  SCA. 


Section  |2.5|  reviews  the  major  classes  of  SCA  cryptanalytic  techniques  devel¬ 
oped  to  date.  In  response  to  these  discoveries,  a  variety  of  countermeasures  have 
been  introduced  to  reduce  the  vulnerability  of  electronic  systems  to  each  new  type 
of  attack.  Section  l2Rl  reviews  known  countermeasures. 
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2.3.2  Reverse  Engineering.  The  idea  of  using  SCA  for  reverse  engineer¬ 
ing  applications  dates  to  Kocher  et  al.’s  original  (1998)  DPA  paper,  in  which  the 
authors  claim  to  have  used  side  channel  analysis  techniques  to  reverse  unknown  al¬ 
gorithms  and  protocols.  The  authors  posited  that  automation  of  reverse  engineering 
of  unknown  systems  might  be  possible,  although  no  details  of  such  an  approach  were 
given.  More  recently,  some  success  has  been  realized  using  side  channel  leakage  to 
manually  facilitate  reverse  engineering  and  extraction  of  keys  from  unknown  algo¬ 
rithms.  This  application  of  SCA  has  been  termed  side  channel  analysis  for  reverse 
engineering  (SCARE)  |Nov03  . 


The  first  explicit  use  of  SCA  for  reverse  engineering  in  the  academic  literature 
was  by  Quisquater  et  al.  |QS02|.  The  authors  show  that  it  is  possible  to  sufficiently 
characterize  the  electromagnetic  side  channel  emissions  of  a  microprocessor  to  create 
an  instruction  template  dictionary.  Future  observations  of  the  EM  emissions  from  an 
identical  microprocessor  can  be  automatically  classified  to  determine  what  operation 
was  being  performed. 


Novak  (2003)  proposed  an  attack,  with  later  improvements  by  Clavier  (2004, 
2007)  to  reverse  engineer  non-trivial  portions  of  an  unknown  A3/A8  cryptographic 


algorithm  Nov03  .  The  targeted  algorithm  is  the  COMP128-2  cipher  used  in  cellu¬ 
lar  phone  SIM  cards  to  authenticate  and  generate  keys  in  GSM  networks  |Roh06|. 
Clavier  improved  on  Novak’s  original  technique  and  demonstrated  a  more  prac¬ 
tical  reverse  engineering  approach  capable  of  retrieving  lookup  tables  of  an  un¬ 
known  cryptographic  algorithm  without  requiring  any  prior  knowledge  of  the  secret 
key  |Cla04,Cla07|.  Daudigny,  et  al.  (2005)  used  SCARE  to  recover  unknown  details 
of  a  DES  implementation  |DLMV05|. 


Early  reverse  engineering  attempts  were  applied  specifically  to  software-based 
microprocessor  implementations.  The  first  extension  of  the  technique  to  an  unknown 
hardware  implementation  was  in  2008  |RDG+08|. 
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In  all  of  these  cases  domain  knowledge  is  used  extensively,  but  the  passive 
reverse  engineering  capability  is  nevertheless  interesting.  In  many  scenarios,  the 
non-invasive  nature  of  side  channel  analysis  provides  clear  advantages  over  more 
invasive  reverse  engineering  techniques.  Reverse  engineering  applications  of  SCA 
are  still  a  relatively  undeveloped  area  of  investigation,  and  it  is  currently  unknown 
whether  it  is  possible  to  generalize  and  automate  the  process  as  envisioned  by  Kocher, 


et  al.  KJJ99,Roh06  .  However,  the  technique’s  existence  is  another  reason  system 


designers  should  not  rely  on  security  by  obscurity. 


2.3.3  Covert  Channel  Engineering.  Researchers  in  high-assurance  comput¬ 
ing  realized  early  on — at  least  by  1973 — that  timing  variations  can  be  used  to  inten¬ 
tionally  pass  information  across  a  controlled  perimeter  |Lam73,van  90j|Wra9  l[[AndO  l] . 
For  example,  a  malicious  Trojan  program  can  be  placed  inside  the  controlled  perime¬ 
ter  of  a  secure  computing  environment  where  it  captures  data  and  re-transmits  it  via 


an  encoded  communications  channel  Lam73,  van90,  Wra91,  AndOl  .  Such  a  commu 


nications  channel  can  be  created  by  intentionally  varying  some  system  characteristic 
visible  to  an  observer  outside  the  controlled  perimeter.  For  example,  the  Trojan 
process  can  cause  page  faults  or  variations  in  CPU  demand  or  disk  cache  loading.  A 
receiving  process  outside  the  perimeter  can  monitor  the  timing  variations  and  decode 
the  message.  This  type  of  scheme  is  known  as  a  covert  timing  channel  |  AndOl]. 


Circuit  design  or  embedded  software  codes  can  be  modified  to  surreptitiously 
broadcast  secret  information  over  a  covert  wireless  channel  (such  as  the  EM  side- 
channel)  to  a  nearby  receiver  |DS06,Dyr07|.  There  are  no  concrete  examples  of  such 
a  hardware-based  malware  attack  having  actually  occurred  in  the  literature  to  date, 
but  there  are  numerous  examples  of  commercial  hardware  shipping  with  software- 
based  malware.  Several  recent  papers  have  been  published  with  proof  of  concept 
designs  that  illustrate  the  feasibility  of  such  an  attack  |KTC+08,LKG+09|. 
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2.3.4  Trojan  Detection.  The  economic  realities  of  semiconductor  and  mi¬ 
crochip  manufacturing  have  resulted  in  a  large-scale  migration  of  most  integrated 
circuit  (IC)  fabrication  to  facilities  outside  the  United  States.  In  2005,  a  special 
U.S.  Defense  Science  Board  Task  Force  examined  the  effects  and  risks  of  outsourcing 


high  performance  microchip  production  to  foreign  countries  Off05  and  concluded 
this  practice  introduces  risks  that  are  unacceptable  for  domestic  defense  and  intelli¬ 
gence  applications.  The  Defense  Science  Board  determined  that  existing  diagnostic 
capabilities  are  insufficient  to  assure  that  chips  produced  by  foreign  foundries  are 
unmodified.  One  particular  risk  cited  is  the  opportunity  for  malicious  foreign  agents 
to  introduce  unauthorized  design  changes  such  as  Trojan  horses  into  the  chips,  as 
described  above,  to  establish  covert  channels  of  communication. 

Agrawal,  et  al.  proposed  extending  side-channel  analysis  techniques  to  detect 


the  presence  of  hidden  or  modified  circuitry  in  integrated  circuits  ABK+07  .  The 
approach  constructs  SCA  fingerprints  by  profiling  the  power,  temperature  and  elec¬ 
tromagnetic  side  channel  characteristics  of  an  integrated  circuit.  After  fingerprinting, 
destructive  techniques  are  used  to  verify  the  authenticity  of  the  characterized  chips; 
i.e.,  they  have  not  been  modified  from  their  original  intended  design.  Remaining 
ICs  are  statistically  tested  against  the  fingerprints  of  the  known  good  chips.  The 
approach  clearly  requires  that  the  destructive  verification  techniques  be  extremely 
reliable  in  detecting  modifications  to  assure  that  the  baseline  is  truly  an  unmodified 
chip.  This  may  not  be  practical  in  light  of  the  Defense  Science  Board  findings. 


2.4  Adversary  Models 

When  discussing  SCA  attacks,  various  assumptions  must  be  made  about  the 
capabilities  of  an  adversary.  It  is  common  to  refer  to  how  powerful  a  hypothetical 
adversary  must  be  to  carry  out  a  particular  attack  in  practice.  In  real  world  situ¬ 
ations,  an  adversary’s  capabilities  can  range  from  having  very  restricted  access  to 
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the  device  at  one  extreme,  to  having  complete  control  of  the  device  or  system  being 
attacked  at  the  other. 


A  powerful  adversary  is  one  that  has  complete  control  of  the  device  or  system 
being  attacked.  In  this  scenario,  the  adversary  can  arbitrarily  choose  the  number 
and  content  of  inputs  the  device  will  process,  can  capture  both  the  inputs  and 
outputs  of  the  device,  and  is  assumed  to  be  in  an  environment  (such  as  a  well- 
equipped  laboratory)  where  the  quality  of  the  side  channel  data  being  collected  may 
be  optimized.  In  the  most  extreme  case,  the  adversary  may  even  have  the  capability 
to  load  new  keys  into  the  device.  The  only  significant  limitation  is  that  stored  keys 
cannot  simply  be  retrieved  as  that  is  prohibited  by  most  cryptographic  devices. 


A  weak  adversary  has  very  limited  capabilities  to  control  and  observe  the  device 
being  analyzed.  In  the  context  of  SCA  attacks,  Eve  still  has  the  capability  to  collect 
side  channel  data  in  some  form  (possibly  noisy  and  otherwise  suboptimal).  Most 
attacks  also  require  that  she  be  able  to  capture  the  contents  of  either  the  input 
or  output  from  the  device  (typically  the  plain-text  or  ciphertext  in  the  case  of  a 
cryptographic  device). 


Many  SCA  attacks  assume  a  powerful  adversary ,  which  implies  some  reduction 
in  practicality  under  certain  conditions.  However,  this  does  not  negate  the  real-world 
threat  since  there  are  many  scenarios  where  such  a  powerful  adversary  truly  exists, 
and  the  attacks  are  easily  within  the  reach  of  a  moderately  equipped  malicious  agent. 
The  publication  of  actual  SPA  and  DPA  attacks  on  the  KeeLoq  remote  keyless  entry 
system  |KKMP09|  and  the  Xilinx  FPGA  bitstream  protection  |MBKP11,  MKP11 


amplify  this  point.  Furthermore,  many  attacks  such  as  profiling  techniques  (see  Sec. 


2.5)  have  very  weak  assumptions  about  the  adversary  from  the  perspective  of  access 


to  the  device  being  attacked. 
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2. 5  Techniques 

Over  the  past  decade,  dozens  of  distinct  SCA  attacks  and  countermeasures 
have  been  developed.  Most,  if  not  all,  are  derived  from  Paul  Kocher’s  original 
timing  and  power  analysis  attacks  |Koc96, KJJ99|.  Because  of  the  large  number  of 
distinct  attacks  that  have  been  developed,  it  is  impractical  to  describe  each  in  detail. 
However,  most  of  the  techniques  are  similar  and  can  be  generally  described  according 
to  their  basic  principles  of  operation.  Two  of  the  most  important  distinguishing 
characteristics  used  to  categorize  SCA  techniques  are  the  analysis  approach  and  the 
number  of  stages  involved  in  the  analysis. 

The  following  definitions  are  used: 


Simple  SCA  (SSCA).  Any  technique  that  directly  interprets  side  channel  mea¬ 
surements,  typically  through  visual  analysis  of  a  captured  signal  in  the  time  or 


frequency  domain  KJJ99  .  Generally  applied  to  one  or  only  a  small  number 


of  observations.  SSCA  is  a  generalization  of  Kocher’s  SPA  technique  to  any 
source  of  information  leakage. 

Differential  SCA  (DSCA).  Any  technique  that  uses  statistical  or  other  mathe¬ 
matical  techniques  to  look  for  small  differences  in  side  channel  data  that  may 
be  correlated  to  the  data  or  operations  of  interest.  DSCA  is  a  generaliza¬ 
tion  of  Kocher’s  DPA  technique  to  other  sources  of  information  leakage  and 


statistical  analysis  techniques  KJJ99  .  DSCA  is  usually  applied  to  a  rela¬ 


tively  large  number  of  observations — varying  from  a  few  dozen  to  a  million  or 
more 


MOP07,Roh06 


Profiling  SCA.  Any  SCA  technique  that  requires  multiple  data-collection  or  anal¬ 
ysis  steps,  such  as  a  profiling  stage  using  a  training  device  followed  by  an  attack 
stage  on  a  separate  target  device. 

This  section  reviews  the  fundamental  SCA  techniques  developed  to  date:  tim¬ 
ing  attacks,  simple  SCA  attacks,  differential  SCA  attacks,  and  profiling  attacks. 
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Each  published  SCA  attack  is  unique  in  its  approach,  and  most  are  specific  to  a  par¬ 
ticular  algorithm  or  implementation  platform.  Timing  attacks  are  presented  first  and 
separately  because  they  provide  the  initial  foundation  for  the  application  of  SCA  to 
cryptanalysis,  and  Kocher’s  original  attack  does  not  fit  neatly  into  the  definition  of 
the  other  fundamental  techniques  that  are  the  subject  of  widespread  research  today. 
A  brief  overview  of  more  advanced  variations  is  also  provided  to  include  profiling, 
higher-order  DSCA,  and  advanced  signal  processing  techniques  that  can  be  applied 
to  aid  in  pre-  or  post-processing  of  captured  side  channel  emissions  data. 


2.5.1  Timing  Analysis.  Timing  analysis  attacks  exploit  the  small  varia¬ 
tions  in  the  computation  time  of  a  cryptographic  implementation  described  in  Sec¬ 
tion  |2.2.1[  Asymmetric  cryptographic  algorithms  in  particular  are  known  to  have 
non-constant  execution  times  due  to  their  conditional  execution  of  time-consuming 
operations  such  as  multiplications  |Koc96|.  These  non-constant  times,  together  with 
knowledge  of  the  underlying  algorithm  or  implementation,  allow  an  adversary  to 
infer  information  about  the  sensitive  data  being  processed.  Although  the  idea  of 
timing  attacks  is  generally  applicable  to  any  cryptographic  implementation  that  op¬ 
erates  in  non-constant  time,  the  details  of  any  particular  attack  are  implementation 
specific  and  there  is  no  generalized  process  for  constructing  such  an  attack. 

Kocher’s  timing  attack  |Koc96  on  RSA  and  other  asymmetric  public-key  al¬ 
gorithms  is  widely  considered  to  be  the  first  side-channel  cryptanalysis  in  the  sci¬ 
entific  literature.  Kocher’s  attack  targets  the  non-constant  execution  time  of  a 
microprocessor-based  RSA  algorithm  implementation.  The  attack  extracts  the  pri¬ 
vate  key  from  a  cryptographic  device  by  analyzing  the  small  variations  in  computa¬ 
tion  time  of  a  large  number  of  decryption  operations.  Kocher  validated  the  attack 
against  RSA  Laboratories’  reference  implementation  |Koc96,  Lab94j.  Kocher’s  tim¬ 
ing  attack  on  asymmetric  ciphers  is  described  in  detail  in  Appendix  |B| 
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Several  practical  attacks  have  also  been  published  to  demonstrate  the  feasibility 
of  extracting  a  cryptographic  key  remotely  across  a  computer  network  based  on  the 
observed  differences  in  computation  or  response  times,  including  those  variations 


introduced  by  cache  misses  (cf.  |BB05HPH98HBct051|KSWH981|KS WH00]|Pag02| ) .  It 
is  noteworthy  that  even  lookup-table-based  approaches  to  cryptography,  which,  on 
the  surface  would  appear  to  take  constant  time,  can  be  vulnerable  to  timing  attacks 
due  to  variations  related  to  cache  misses. 


2. 5. 1.1  Final  Notes.  Kocher’s  original  analysis  targeted  crypto¬ 
graphic  algorithms  based  on  modular  exponentiation.  However,  he  asserts  that  any 
cryptographic  implementation  that  runs  in  non-constant  time  is  likely  to  be  vulnera¬ 
ble  to  timing  attacks.  The  plethora  of  attacks  that  have  been  published  over  the  last 
decade  reinforces  this  claim,  and  investigators  continue  to  identify  new  examples. 


2.5.2  Simple  Analysis.  In  the  SSCA  class  of  techniques,  an  analyst  ex¬ 
amines  (typically  visually)  measured  side  channel  data  looking  for  distinct  features 
that  reveal  information  about  the  operations  being  performed  or  data  being  pro¬ 
cessed.  The  analyst  is,  in  effect,  performing  a  high-level  reverse  engineering  based 
on  knowledge  of  the  algorithm  that  produced  the  side  channel  leakage.  The  concept 
is  not  limited  to  the  power  side  channel  (e.g.,  Kocher’s  DES  attack  |KJJ99j),  and  can 
potentially  be  applied  to  the  data  obtained  from  any  of  the  side  channels  described 
in  Section  [2721 


When  side  channel  data  is  visually  analyzed  in  the  time  domain,  features  like 
shape  and  magnitude  can  look  very  different  for  different  operations,  or  for  the  same 
operation  performed  with  different  data  or  outcomes.  A  well-known  example  is  the 
power  consumption  for  conditional  branch  instructions  in  micro-controllers,  which 


often  depends  on  whether  the  branch  is  taken  or  not  KJJ99  .  The  power  trace  of 
a  device  that  takes  a  branch  may  look  significantly  different  than  the  trace  of  the 
same  device  when  it  does  not  take  the  branch.  By  analyzing  these  differences  and 
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applying  knowledge  of  the  underlying  algorithm,  it  is  possible  to  deduce  the  content 
of  the  data  being  processed — generally  some  sensitive  private  data  of  interest  such 
as  a  portion  of  a  cryptographic  key. 


2.5.2. 1  Simple  Attack  on  a  DES  Parity  Check.  The  level  of  difficulty 
involved  in  applying  SSCA  to  extract  data  from  a  system  varies  greatly.  An  extreme 
example  of  a  trivial  attack  on  a  cryptographic  key  focuses  on  the  key  loading  mech¬ 


anism  rather  than  the  cryptographic  algorithm  itself  Roh06  .  When  a  key  is  loaded 
into  memory  from  a  storage  location,  it  is  desirable  to  verify  the  key’s  integrity  be¬ 
fore  it  is  used.  One  way  to  do  this  is  to  store  a  parity  bit  with  each  7  data  bits.  The 
data  parity  is  verified  when  the  key  is  initially  loaded. 


Algorithm  3  DES  parity-check  algorithm  |Roh06 
Require:  Key  K  (8  bytes  —  ea.  7  data  bits,  1  parity  bit). 

1:  for  i  =  8  downto  1  do 

2:  parity  <—  0 

3:  for  j  =  8  downto  1  do 

4:  if  bit  j  of  Key[z]  =  1  then 

5:  parity  <—  parity  +  1 

6:  end  if 

7:  end  for 

8:  if  Even(parity)  then 

9:  ParityError(); 

10:  end  if 

11:  end  forreturn  () 


Algorithm  [3]  illustrates  an  implementation  of  such  a  parity  checking  algorithm. 
The  algorithm  examines  the  value  of  each  data  bit  and  adds  one  to  the  parity  if  the 
bit  is  a  ‘lb  Line  four  of  the  algorithm  executes  a  conditional  branch  decision  based 
on  each  data  bit. 


Figure  2_2  shows  how  trivial  an  attack  on  this  type  of  implementation  can  be. 
The  inner  and  outer  loop  structure  is  immediately  apparent  from  the  peaks.  The  data 
shown  here  includes  the  execution  of  three  outer  loops,  each  of  which  is  composed  of 


27 


Figure  2.2  Rohatgi’s  SPA  Attack  Against  DES  Key  Parity  Check  Roh06 


eight  inner  loops.  The  differences  between  the  inner  loop  iterations  where  the  data 
bit  is  a  1 1’  and  those  where  it  is  a  c0’  are  also  clearly  apparent.  When  the  data  bit 
is  a  c  0  ’ ,  the  addition  step  on  Line  5  is  skipped,  and  the  current  drawn  by  the  circuit 
peaks  and  drops  off  rapidly.  When  the  data  bit  is  a  ‘  1’ ,  however,  the  current  drawn 
first  drops  off  and  then  increases  again  for  a  short  time.  Thus,  it  is  trivial  to  “read” 
the  entire  secret  key  by  directly  examining  the  current  trace  and  assigning  each  peak 
a  value  of  ‘O’  or  1 1’ .  It  is  noteworthy  that  a  timing  attack  is  also  possible  since 
the  inner  loop  appears  to  take  slightly  longer  when  the  data  bit  is  a  ‘  1  ’  than  when 
it  is  a  c  0  ’ . 


Despite  the  simple  label,  most  SSCA  attacks  are  significantly  more  subtle 
and  complex  than  the  parity  example.  Kocher’s  DES  attack,  for  instance,  requires 
substantial  insight  into  the  operation  of  the  underlying  cryptographic  algorithm  to 
extract  the  key  from  a  device. 


Information  may  leak  from  a  wide  variety  of  operations  and  can  vary  based  on 
the  data  being  manipulated  as  well.  For  example,  on  some  microprocessor-based  de¬ 
vices  a  load  register  instruction  reveals  the  Hamming  weight  or  Hamming  distance 
of  the  data  register  |MDS99|.  For  such  devices,  the  power  consumed  is  proportional 
to  the  number  of  c  1  ’  bits  being  processed,  or  transferred  across  a  bus. 
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Figure  2.3  Hamming  distance  power  leakage  from  an  8-bit  smart-card  micro¬ 


controller  performing  a  load  register  operation  MDS99 


Definition  3  (Hamming  weight)  is  the  number  of  non-zero  bits  in  a  binary  vec¬ 
tor  of  finite  length  / MOPOl /. 


Definition  4  (Hamming  distance)  between  two  binary  vectors  of  the  same  length 


is  the  number  of  coordinates  in  which  the  two  vectors  differ  MacOSl . 


Figure  [2~3]  is  the  result  of  an  SCA  experiment  on  a  vulnerable  8-bit  smart-card 
that  clearly  leaks  the  Hamming  distance  of  the  data  being  loaded  into  a  register.  Such 
leakage  is  significant  because  for  some  cryptographic  implementations,  knowledge  of 
the  Hamming  weight  or  distance  of  a  portion  of  the  cryptographic  key  is  sufficient 
to  make  a  brute-force  key  search  or  mathematical  cryptanalysis  of  the  remaining 


possibilities  computationally  practical  MDS99 


2. 5. 2. 2  Practicality  of  SSC A  Attacks.  A  significant  practical  consid¬ 
eration  for  SSCA  techniques  is  they  require  substantial  knowledge  of  how  a  particular 
cryptographic  implementation  is  implemented.  Although  the  need  for  implementa¬ 
tion  details  appears  to  be  a  severe  limitation,  numerous  researchers  assert  that  SSCA 
techniques  are  still  quite  effective  in  practice  unless  the  system  implements  specific 


countermeasures  to  prevent  them  KJJ99,  Roh06,  MOP07  .  Careful  inspection  of 


the  side  channel  leakage  may  actually  reveal  sufficient  information  about  the  imple- 
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mentation  to  allow  further  refinement  of  an  attack,  or  sufficient  information  for  the 
attacker  to  infer  important  details  about  how  the  system  is  implemented. 


SSCA  attacks  are  most  effective  against  sequential  implementations  of  algo¬ 
rithms.  Use  of  SSCA  to  attack  a  fully  pipelined  hardware  implementation  of  DES, 
for  instance,  would  be  difficult  because  the  parallel  circuit  activity  would  significantly 


hinder  direct  visual  interpretation  MOP07  .  However,  even  if  SSCA  techniques  are 
insufficient  to  extract  information  from  a  side  channel  signal  directly,  they  are  fre¬ 
quently  useful  to  help  guide  more  advanced  techniques  and  to  infer  information  about 
the  underlying  implementation  or  algorithm  being  executed. 


2.5.3  Differential  Analysis.  The  second  power  analysis  technique  intro¬ 


duced  in  Koc96  is  known  as  differential  power  analysis  (DPA).  DPA  is  a  statistical 
technique  that  correlates  the  effects  of  the  data  being  manipulated  to  the  power  con¬ 
sumption  trace  rather  than  inferring  the  data  being  manipulated  from  knowledge  of 
what  operations  were  performed  as  in  SPA.  Whereas  SPA  can  be  performed  using 
only  a  single  power  trace  for  an  encryption  device,  DPA  requires  multiple  power 
traces  to  make  statistical  inferences.  DPA  is  capable  of  extracting  information  due 
to  variations  that  are  too  subtle  to  be  identified  through  direct  examination  of  the 
data  by  SSCA.  The  technique  is  not  unique  to  the  power  side  channel,  and  can 
be  generalized  to  the  physical  leakage  due  to  other  phenomena  as  differential  SC  A 
(DSCA)  [KJJ99,  ZF05,  MQP07,  Roh06  .  A  key  assumption  of  most  DPA  attacks  is 
that  the  attacker  has  essentially  unfettered  access  to  the  physical  device  under  attack 
and  can  cause  it  to  perform  encryption  or  decryption  operations  at  will. 


DSCA  techniques  use  statistical  properties  of  multiple  samples,  which  reduces 
noise  from  measurement  error  or  non-relevant  circuit  activity  (either  intentional  or 
unintentional)  while  amplifying  the  effects  of  the  circuit  performing  the  operation  or 
manipulating  the  data  of  interest.  With  a  sufficiently  large  number  of  observations, 
it  is  possible  to  identify  very  small  correlations  between  the  internal  device  state  and 
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the  leaked  side  channel  information.  Various  mathematical  techniques  are  used  to 


identify  these  correlations,  as  described  in  Section  2.5.3.1 


The  common  characteristic  of  all  DSCA  techniques  is  they  use  statistical  meth¬ 
ods  to  find  small  differences  in  the  leaked  side  channel  information  due  to  variations 
in  processed  data  or  operations  over  a  relatively  large  number  of  operations.  The 
number  of  observations  required  to  successfully  apply  DSCA  techniques  range  from 
dozens  to  millions  depending  on  the  implementation,  technique,  and  environmental 


factors,  and  any  countermeasures  present  MOP07 


A  key  advantage  of  DSCA  techniques  over  SSCA  techniques,  besides  their  more 
robust  information  recovery  capability,  is  they  do  not  require  detailed  implementa¬ 
tion  knowledge.  Rather,  basic  knowledge  of  the  underlying  algorithm  is  sufficient 
to  carry  out  the  statistical  analysis.  In  fact,  DSCA  techniques  have  the  interesting 
property  (as  pointed  out  in  [KJ J99 1 )  that  they  “automatically”  identify  the  points 
in  time  where  the  side  channel  leakage  is  correlated  to  the  value  of  some  piece  of 
data  that  is  used  by  the  cryptographic  algorithm  since  the  chosen  metric  indicating 
the  presence  of  correlation  will  be  maximized  at  the  relative  times  in  the  trace  when 
the  relationship  is  strongest.  The  leakage  mapping  approach  developed  in  Chapter |4] 
provides  a  methodology  for  systematically  identifying  all  such  potential  points  of 
key-dependent  leakage  throughout  an  entire  encryption  operation. 


Mangard,  et  al.  and  other  researchers  have  noted  that  even  if  the  details  of 
a  particular  implementation  are  unknown,  the  combination  of  DSCA  and  SSCA 
techniques  can  be  used  together  to  learn  sufficient  details  about  an  implementation 
to  allow  a  successful  SC  A  attack  |MOP07  . 

The  more  accurately  the  leakage  model  employed  characterizes  the  true  re¬ 
lationship  of  a  particular  intermediate  value  to  the  side  channel  leakage  from  the 
device,  the  better  the  results  will  be  in  the  final  step  of  the  DSCA  attack.  Various 
techniques  for  modeling  the  leakage  associated  with  an  intermediate  computation 
are  discussed  in  more  detail  in  Chapter  |4} 
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2.5.3. 1  Statistical  Techniques.  There  is  no  consensus  on  an  optimal 
mathematical  technique  for  the  DSCA  procedure,  although  various  proposals  have 
been  put  forward  with  claims  of  optimality.  Kocher’s  original  DPA  attack  was  based 


on  a  difference  of  means  technique  KJJ99,MOP07  .  Later  works  introduced  other 
approaches  including  the  Pearson  correlation  coefficient,  Distance  of  Means  (a  vari¬ 
ation  of  Kocher’s  difference  of  means),  and  Bayesian  estimation  procedures.  The 
correlation  coefficient  and  Bayesian  techniques  have  received  the  most  attention  in 
recent  research.  Mangard,  et  al.  claim  the  correlation  coefficient  is  the  most  gen¬ 
eral  because  it  can  more  easily  handle  multiple-bit  leakage  models  than  some  other 


techniques  MOP07  .  Recent  experimental  data  supports  the  claim  that  all  of  the 


commonly  used  statistical  techniques  produce,  roughly,  equivalent  results  MOS09 


The  basic  strategy  behind  all  of  these  techniques  is  to  identify  linear  relation¬ 
ships  between  the  predicted  leakage  under  some  leakage  model  and  one  or  more 
columns  of  the  actual  observed  data  matrix — where  each  column  corresponds  to  a 
particular  sampled  instant  in  time  relative  to  the  start  of  the  encryption  operation. 
When  the  key  hypothesis  is  correct,  there  should  be  a  linear  relationship  between  the 
hypothesized  leakage  value  and  the  observed  leakage  value  at  the  times  when  that 
value  was  actually  manipulated  by  the  circuit.  If  Pearson’s  correlation  coefficient 
is  used,  the  highest  observed  correlation  indicates  the  corresponding  key  hypothesis 
most  likely  to  be  correct.  Furthermore,  the  relative  position  of  the  samples  with 
the  highest  correlation  coefficients  indicate  at  what  relative  time(s)  the  targeted 
intermediate  value  is  manipulated.  The  DSCA  procedure  is  described  in  detail  in 
Chapter  [4} 


The  largest  correlation  coefficients  indicate  the  sub-key  most  likely  to  have 
produced  the  observed  results.  Plotting  the  rows  of  the  correlation  matrix  will  result 
in  noticeable  peaks  in  the  plot  that  corresponds  to  the  true  key,  while  incorrect  keys 
have  the  appearance  of  noise  within  a  solid  band.  A  typical  DPA  result  for  three  key 


hypotheses  (and  reference  current  trace)  is  plotted  in  Fig.  2.4.  The  trace  showing 
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a  large  peak  indicates  the  correct  (sub) key  hypothesis  has  likely  been  found.  In 
practice,  keys  similar  to  the  true  key  may  also  produce  high  correlations,  leaving 
some  ambiguity  as  to  what  the  correct  key  is — or  even  resulting  in  an  incorrect  key 
having  the  highest  correlation  coefficient  at  some  point  in  time.  This  can  sometimes 
be  overcome  directly  through  additional  post-processing  or  by  collecting  additional 
observations.  Furthermore,  even  if  a  DSCA  attack  leaves  many  possible  candidate 
sub- keys,  the  reduction  in  guessing  entropy  may  be  sufficient  to  allow  brute  force 
computation  of  the  full  key. 
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Figure  2.4  Typical  DPA  traces,  one  correct  and  two  incorrect,  with  power  reference 
|KJJ99  . 


2. 5. 3. 2  Practical  Limitations  of  DSCA.  Because  DSCA  techniques 
require  less  implementation  knowledge  and  are  typically  better  than  SSCA  at  ex¬ 
tracting  information,  they  are  generally  considered  to  be  a  more  powerful  class  of 
attacks.  However,  if  significant  noise  is  present  the  statistical  techniques  used  can 
require  a  very  large  number  of  observations  to  be  successful.  Collecting  a  large  num¬ 
ber  of  observations  in  practice  may  require  an  attacker  to  have  dedicated  access  and 
control  of  the  device  being  analyzed.  Thus,  in  practice,  the  security  vulnerabilities 
of  a  system  to  DSCA  techniques  may  be  less  of  a  concern  to  the  system  designer 
than  SSCA  vulnerabilities. 
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Another  practical  limitation  of  standard  DSCA  techniques  is  an  adversary 
must  generally  have  access  to  obtain  either  known  inputs  or  outputs  from  the  device, 


although  some  advanced  differential  techniques  (see  Section  2. 5. 5. 3)  can  be  applied 
in  situations  when  this  is  not  possible.  AES,  when  operating  in  counter-mode,  is 
also  susceptible  to  DPA  techniques  with  knowledge  of  side  channel  data  only  (no 
knowledge  of  inputs  or  outputs  is  necessary)  |  Jaf07|. 


2.5.4  Profiling  Techniques.  Profiling,  or  multi-stage  attacks  were  first 


introduced  by  Chari  et  al.  in  2002  CRR02  in  the  context  of  side  channel  cryptanal¬ 
ysis.  In  general,  profiling  techniques  are  specific  applications  of  the  general  problem 
of  pattern  recognition,  in  which  a  classifier  is  presented  with  a  pattern  or  signal 
(e.g.,  the  EM  leakage  from  a  target  device),  and  must  then  classify  it  as  belonging 
to  one  of  several  possible  classes  |TK09). 

These  attacks  postulate  a  powerful  adversary  that  possesses  a  separate  identical 
(or  nearly  identical)  device  over  which  they  have  full  control.  In  the  variation  known 
as  a  template  attack ,  this  training  device  is  used  for  a  profiling  step,  in  which  an 
attacker  creates  a  precise  multivariate  probability  distribution  of  the  device’s  side 


channel  leakage  while  it  operates  on  known  sub-keys  GLRP06  .  This  phase  of  the 
attack  is  sometimes  referred  to  as  the  offline  phase. 


The  results  of  the  profiling  stage  are  used  to  classify  future  observations  from 
a  target  device  over  which  the  adversary  does  not  have  full  control,  but  can  observe 
the  side  channel  leakage.  By  classifying  the  new  observed  trace  according  to  the  dis¬ 
tribution  they  are  likely  to  come  from,  the  most  likely  sub-key  is  revealed.  Profiling 
techniques  are  considered  very  powerful  because  they  require  only  a  few  (sometimes 
just  one)  observations  to  extract  a  key.  Additionally,  the  attack  phase  does  not  re¬ 
quire  that  the  adversary  control  or  have  knowledge  of  the  device  inputs  or  outputs. 
The  attack  phase  is  sometimes  referred  to  as  the  online  phase. 
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Two  specific  variations  of  multi-stage  attacks  that  have  been  proposed  are  the 


template  attack  CRR02  and  the  stochastic  model  attack  SLP05  .  Template  attacks 
characterize  the  side  channel  leakage  from  the  device  by  creating  a  template  for  each 
possible  sub- key.  The  noise  distribution  is  therefore  assumed  to  be  key-dependent. 
Thus,  the  profiling  stage  of  a  template  attack  creates  covariance  matrices  for  each 
possible  (sub)key.  Schindler  et  al.’s  stochastic  model,  on  the  other  hand,  presumes 
the  side-channel  noise  is  independent  of  the  (sub)  key. 

Gierlichs,  et  al.  showed  that  while  stochastic  models  have  a  lower  up-front 
computational  cost  during  profiling,  they  are  less  effective  at  classifying  future  signals 


GLRP06  .  Template  attacks  are  the  more  effective  tool  if  there  are  no  limitations  on 


the  number  of  observations  that  can  be  collected  during  profiling  and  if  the  workload 
for  template  building  is  computationally  feasible.  Schindler’s  stochastic  model  may 
be  more  effective  in  cases  where  there  is  a  bound  on  the  number  of  observations  that 
can  be  collected  during  profiling,  or  if  other  considerations  mean  constructing  a  full 
set  of  templates  is  not  practical. 


In  2005,  Agrawal  et  al.  extended  template  attacks  to  combine  traditional  DPA 
techniques  with  template  attacks — termed  a  template- enhanced  DPA  attack.  This 
technique  is  effective  at  defeating  standard  masking  countermeasures  against  power 
analysis  attacks  on  smart-card  implementations  of  AES  and  DES  produced  by  two 


separate  manufacturers  ARRS05  .  Recent  work  has  shown  in  detail  how  template 
attacks  can  be  used  in  practice  to  extract  a  cryptographic  key  without  knowledge 
of  the  system  inputs  or  outputs  (plain-  or  cipher-texts)  |HTM09|  with  only  a  few 
(<  200)  observed  operations. 


A  practical  limitation  of  template  attacks  is  the  computational  cost  due  to  their 
reliance  on  multivariate  statistics  to  characterize  the  dependencies  among  the  various 
temporal  locations  in  the  leakage  traces.  For  traces  containing  a  large  number  of 
sample  points,  this  step  is  very  computationally  intensive,  and  it  may  not  be  practical 
to  consider  all  the  sampled  data.  In  practice,  most  of  the  published  techniques 
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overcome  this  limitation  by  selecting  a  subset  of  the  data  points  for  use  in  the 
profiling  /  template  building  stage.  Typically,  some  heuristic  is  used  to  select  a  subset 
of  points  of  interest  that  correspond  to  the  same  relative  times  in  each  observation 


MOP07,  APSQ06] . 


Several  rules  of  thumb  have  been  used  including  selecting  the  subset  of  sam¬ 
ples  that  correspond  to  the  relative  time  where  the  observations  show  the  maximal 
variance  |MOP07|  or  maximal  difference  between  mean  traces  when  observations  are 


partitioned  according  to  a  classical  Kocher-style  DSCA  attack  CRR02  .  Additional 
criteria  are  sometimes  applied  such  as  limiting  the  number  of  selected  points  to  one 
per  clock  cycle  to  eliminate  redundancy  of  the  selected  data  points  |APSQ06|.  A 
primary  objective  is  to  compress  the  data  set  that  must  be  manipulated  while  main¬ 
taining  the  most  important  information — which  then  takes  on  the  role  of  character¬ 
izing  the  features  of  different  signal  classes.  Recent  research  to  find  more  optimal 
techniques  for  identifying  leakage  points  is  discussed  below. 


2.5.5  Advanced  Techniques. 


2.5.5. 1  Principal  Component  Analysis.  Principal  component  analy¬ 
sis  (PCA)  is  a  linear  transformation  that  reduces  the  dimensionality  of  a  data  set 
while  retaining  the  majority  of  the  important  information  from  the  original  dataj^]  A 
well-known  application  of  PCA  is  automatic  facial  recognition.  The  PCA  transfor¬ 
mation  is  such  that  each  coordinate  (component)  is  orthogonal  to  the  others  and  is 
a  linear  combination  of  many  individual  samples.  By  using  PCA  to  select  the  points 
of  interest  in  profiling  attacks,  the  n  dimensions  (components)  that  account  for  a 
largest  percentage  of  the  overall  variance  between  the  sample  classes  are  identified. 
In  template  attacks,  PCA  maximizes  the  inter-class  variance  between  possible  op¬ 
erations  (typically  the  same  operation  performed  using  each  possible  sub- key).  The 

4PCA  and  its  variations  are  also  known  as  the  empirical/discrete  Karhunen-Loeve  transform, 
the  Hotelling  transform,  and  proper  orthogonal  decomposition  (POD)  in  various  academic  circles. 
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magnitude  of  the  resulting  eigenvalues  associated  with  each  component  indicates 
the  relative  importance  of  the  component,  and  normally  a  small  number  of  prin¬ 
cipal  components  are  sufficient  to  capture  the  majority  (80-90%)  of  the  inter-class 
variance. 


Archambeault  et  ah  introduced  a  systematic  technique  using  PCA  for  selection 
of  leakage  points  of  interest  that  can  subsequently  be  used  to  build  templates  as  an 
improvement  on  previous  heuristic  techniques  |APSQ06|.  Archambeault  experimen¬ 
tally  demonstrated  a  case  where  7  components  chosen  by  PCA  produce  better  classi¬ 
fication  results  (93.3%  vs.  91.8%  correct)  than  a  template  built  from  42  points  using 
a  difference  of  means  heuristic.  The  paper  does  not,  however,  assess  the  computa¬ 
tional  efficiency  of  performing  the  combined  PCA  and  template- building  procedure  to 
previous  techniques  using  larger  numbers  of  sample  points  for  the  template-building 
phase. 


Archambeault ’s  PCA  technique  has  two  notable  advantages  over  heuristic  tech¬ 
niques  for  selecting  points  of  interest.  First,  it  provides  superior  classification  results 
with  a  smaller  computational  effort  during  the  template-building  phase  although 
the  computational  load  of  the  PCA  transformation  itself  may  negate  this  benefit. 
Secondly,  PCA  provides  a  quantitative  measure  of  the  number  of  components  (trans¬ 
formed  data  points)  to  adequately  capture  the  majority  of  the  data’s  variability.  A 
key  assumption  that  has  thus  far  resulted  in  effective  attacks  in  practice  is  that  the 
side-channel  variability  is  a  good  indicator  of  the  temporal  location  of  information 
leakage. 


A  weakness  of  the  PCA  technique  is  that  although  the  resulting  subspace  max¬ 
imizes  the  inter-class  variance  between  possible  classes  (sub-keys  in  most  template 
attacks),  the  technique  neglects  the  effect  of  intra-class  variance  on  classifier  per¬ 
formance.  This  is  because  the  principal  components  are  computed  from  the  mean 
traces  for  each  possible  signal  class  (generally  an  operation  performed  for  a  given 
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sub-key  in  most  SCA  template  attacks)]^]  As  a  result,  if  there  is  a  large  degree  of 
variance  in  the  side-channel  leakage  for  a  single  class  (sub-key  or  intermediate  value), 
it  could  result  in  degraded  classifier  performance. 


2. 5. 5. 2  Fisher’s  Linear  Discriminant  Analysis  (LDA).  Standaert 
and  Archambeau  proposed  applying  an  alternate  technique  known  as  Linear  Dis¬ 
criminant  Analysis  (LDA)  to  address  the  possible  effects  of  intra-class  variance  on 
classifier  performance  |SA08|.  LDA  seeks  a  linear  transformation  of  sample  points 
into  a  subspace  that  maximizes  the  ratio  between  the  inter-class  distance  and  the 
intra-class  variance  after  projection.  Experimental  data  shows  that  LDA,  as  intu¬ 
itively  expected,  provides  a  more  optimal  characterization  of  the  target  signal  classes 
than  does  PCA.  However,  computation  of  the  LDA  solution  is  substantially  more 
expensive  than  PCA,  and  thus  limits  the  number  of  samples  per  observation  that 
can  be  handled.  Although  more  expensive,  the  authors  demonstrated  that  LDA 
is  an  effective  tool  for  dimensionality  reduction  for  use  in  device  profiling.  In  this 
work,  an  n— class  variant  of  LDA  known  as  Multiple  Discriminant  Analysis  (MDA) 
is  employed  for  dimensionality  reduction  in  Chapter  [3| 


2. 5. 5. 3  Higher-Order  DSC  A.  The  DSCA  techniques  described  to  this 
point  are  first-order  DSCA  attacks.  The  attack  order  refers  to  the  number  of  samples 
from  the  trace  that  are  simultaneously  considered  |MOP07|.  If  more  than  one  sample 
is  considered,  then  the  attack  is  known  as  a  higher-order  DSCA  attack  (HO-DSCA). 
A  dth- order  DSCA  attack  is  one  which  simultaneously  considers  d  samples  of  the 
side-channel  trace.  HO-DSCA  attacks  were  first  proposed  by  Kocher  in  his  original 


DPA  paper,  and  have  been  investigated  extensively  since  KJJ99,  Roh06,  MOP07 


5PCA  could  be  performed  directly  on  the  raw  data,  but  instead  of  capturing  the  components  of 
maximal  variance  between  classes,  it  would  result  in  components  that  capture  the  components  of 
maximal  variance  across  all  data  (irrespective  of  the  classes). 
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The  motivation  for  HO-DSCA  attacks  is  to  overcome  masking  countermeasure 
described  in  Section  |A.4|  To  be  effective  against  masking,  HO-DSCA  considers  d 
separate  points  that  correspond  to  d  different  sensitive  intermediate  values,  each  of 
which  is  protected  by  the  same  mask.  Theoretically,  a  d^-order  DSCA  attack  is 
capable  of  bypassing  (d  —  1) 4/1 -order  masking  scheme 


CJRR99 


The  precise  time  when  the  d  sensitive  intermediates  are  manipulated  is  gener¬ 
ally  unknown  a  priori  by  the  adversary,  although  most  of  the  available  literature  on 
HO-DSCA  attacks  assumes  these  locations  are  known  |PRB09|.  However,  for  practi¬ 
cal  implementation  reasons,  masking  implementations  usually  generate  two  sensitive 
intermediates  using  the  same  mask  in  the  first  and  last  rounds.  Therefore,  in  prac¬ 
tice,  attacking  intermediates  in  the  first  and  last  round  has  been  shown  to  be  effective 
against  many  masking  implementations  |OMPR05,MOP07|.  Frequency- domain  sig¬ 
nal  processing  techniques  have  also  been  proposed  as  a  technique  to  identify  the 
manipulation  times  (WW04|.  In  the  context  of  profiling  techniques,  an  adversary 
has  control  of  a  training  device  and  can  identify  relative  times  by  carefully  profiling 
the  device’s  behavior. 

Once  the  times  of  the  sensitive  data  manipulations  are  identified,  the  data  is 
pre-processed  to  combine  the  multiple  sensitive  variables  through  some  combining 
function  (typically  additive  or  multiplicative) ,  which  maps  the  multi- variate  problem 
to  a  uni- variate  problem  |MOP07|.  After  pre-processing,  the  DSCA  attack  proceeds 
as  normal. 

Because  of  the  practical  issues  involved  in  locating  the  sensitive  manipula¬ 
tions  and  the  complexity  of  carrying  out  such  an  attack  is  believed  to  increase  ex¬ 
ponentially  with  the  attack  order,  HO-DSCA  attacks  are  sometimes  dismissed  as 


impractical  MOP07,PRB09  .  Oswald  et  al.  published  a  practical  attack  on  an  8-bit 
smart-card  micro-controller  protected  by  a  2nd-order  masking  scheme  with  under  400 
observations,  without  any  a  priori  knowledge  of  the  times  when  sensitive  data  would 


be  manipulated  OMPR05  .  Other  recent  work  indicates  sub-exponential  complex- 
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ity  growth  attacks  are  possible  |GBPV10|.  Additionally,  Mangard  et  al.  found  that 
hardware  masking  implementations  are  frequently  easier  to  attack  than  software  im¬ 
plementations  because  of  the  unintentional  effects  of  parallel  implementations  that 
simultaneously  process  both  random  masks  and  masked  values  |MOP07  . 


2. 5. 5-4  Multi-channel  Attacks.  Agrawal  et  al.  proposed  combining 
information  leakage  from  multiple  simultaneous  side  channels  |ARR03|.  Their  ap¬ 
proach  is  a  generalization  of  standard  DSCA  techniques  and  requires  leakages  from 
the  combined  channels  be  very  “similar”  at  the  times  when  the  side  channel  signal  is 
correlated  to  the  underlying  data  or  operations  of  interest.  For  the  particular  DES 
implementation  studied,  a  multi-channel  EM  and  power  attack  is  significantly  more 
efficient  than  either  single-channel  attack  alone.  The  number  of  traces  required  to 
successfully  extract  the  DES  key  using  a  maximum-likelihood  based  DPA  attack  is 
substantially  reduced — in  some  extreme  cases  by  more  than  80%.  The  initial  ap¬ 
proach  does  not  allow  for  combination  of  leakages  that  occur  at  different  relative 
times  in  the  side  channel  trace. 

Standaert  and  Archambeau  later  extended  the  concept  of  a  multi-channel  at¬ 
tack  to  the  creation  of  multi-channel  templates  |SA08|.  In  their  technique,  the  power 
and  EM  side  channel  measurements  (taken  simultaneously  during  a  single  operation) 
are  simply  concatenated  together  to  create  a  combined  power  /  EM  feature  vector. 
Templates  are  then  built  normally,  as  described  in  Section  [2.5.4  The  authors  found 
that  the  combined  power  and  EM  template  attack  performed  significantly  better 
than  either  technique  alone  (and  that  EM  alone  performed  significantly  better  than 
power  alone)  for  the  8-bit  micro-controller  evaluated. 


2. 5. 5. 5  Combination  of  SC  A  with  Other  Techniques.  Although  gener¬ 
ally  outside  the  scope  of  this  research  (with  the  exception  of  the  algebraic  cryptanal¬ 
ysis  technique  described  below),  a  related  research  area  combines  SCA  techniques 


with  other  attacks,  including  invasive  or  semi-invasive  attacks  SA03,  Roh06,  Sko06 
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and  mathematical  cryptanalysis  techniques  to  augment  SCA  attacks  (or  vice  versa) 
Roh06|.  For  example,  side  channel  information  can  be  used  to  precisely  time  the 


injection  of  faults  jRoh06 


For  secure  cryptographic  algorithms,  mathematical  cryptanalysis  is  not  a  cred¬ 
ible  threat  in  itself  since  the  required  computational  effort  makes  it  impractical 
assuming  the  mathematical  basis  has  no  flaws.  Implementation  attacks  such  as  side- 
channel  attacks,  on  the  other  hand,  exploit  the  vulnerabilities  introduced  by  the 
physical  realization  of  the  cryptographic  algorithm — which  is  not  a  pure  black-box 
representation  of  the  cipher.  Even  if  an  SCA  attack  is  unsuccessful  at  extracting 
a  full  cryptographic  key  from  a  physical  system,  it  may  be  successful  at  identifying 
a  portion  of  the  key  or  eliminating  some  class  of  possibilities.  In  such  a  case,  the 
problem  complexity  may  be  reduced  enough  to  make  a  brute  force  or  other  direct 


cryptanalytic  attack  practical  Roh06 


One  of  the  most  interesting  and  effective  techniques  published  to  date  is  the 
recent  work  by  Renauld  and  Standaert  to  combine  profiling  SCA  attacks  with  alge¬ 
braic  cryptanalysis  |RSVC09j.  Rather  than  focusing  on  a  single  intermediate  value, 
algebraic  SCA  techniques  use  extensive  profiling  of  a  device  to  exploit  the  infor¬ 
mation  leakage  of  as  much  information  about  the  various  intermediate  values  as 
possible.  Standard  SCA  template-based  techniques  are  used  for  the  profiling  and 
intermediate  value  extraction  from  the  target  device.  The  extracted  intermediates 
become  known  values  in  an  over-defined  system  of  boolean  equations  that  can  be 
solved  for  the  unknown  key  using  an  automated  boolean  satisfiability  (SAT)  solver. 
The  results  achieved  using  this  technique  are  compelling,  in  that  this  appears  to  be 
the  first  technique  to  claim  the  ability  to  extract  an  AES  key  with  a  single  target 
trace  and  no  knowledge  of  plain-  or  cipher-text.  The  technique  currently  depends 
on  very  accurate  extraction  of  the  intermediate  values  used  in  the  solution,  although 
the  authors  suggest  several  ways  this  dependency  can  be  reduced  in  practice. 
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2. 6  Countermeasures 


All  SCA  countermeasures  have  a  common  objective — the  elimination  of  dis¬ 
cernible  linkages  between  sensitive  internal  circuit  behavior  (data  manipulation  and 
operations)  and  one  or  more  external,  physically  observable  phenomena.  In  practice, 
completely  eliminating  information  leakage  from  a  single  side  channel  is  an  extremely 
difficult  problem.  The  problem  becomes  more  difficult  when  all  possible  side  channels 
are  considered.  Various  researchers  have  asserted  that  it  is  fundamentally  impossi¬ 
ble  to  make  the  side  channel  emissions  of  a  device  completely  independent  of  the 
underlying  computations  |MOP07,Roh06  . 


As  is  the  case  when  security  measures  are  implemented  for  any  type  of  sys¬ 
tem,  all  of  the  countermeasures  discussed  in  this  section  carry  with  them  associated 
implementation  costs.  Typical  costs  are  slower  performance,  increased  circuit  size, 
increased  power  consumption,  and/or  increased  design  time  and  expense.  For  this 
reason,  in  the  complex  trade-space  of  real  systems,  designers  may  focus  on  the  most 
sensitive  or  critical  circuit  areas,  and  attempt  to  prevent  information  leakage  only 
from  those  subcomponents.  In  practice,  system  designers  and  architects  balance 
side-channel  leakage  resistance  and  other  security  measures  with  required  system 
cost  and  performance.  Rather  than  attempting  to  build  an  impenetrable  system,  a 
suitable  goal  is  a  design  secure  enough  to  deter  would-be  adversaries  from  applying 
the  necessary  resources  to  defeat  it.  For  a  rational  adversary,  this  is  achieved  in 
practice  if  the  costs  of  carrying  out  an  attack  outweigh  the  perceived  benefits  of 
defeating  the  system’s  protections. 


At  the  highest  level,  countermeasures  are  classified  as  either  procedural  or  phys¬ 
ical  design  countermeasures.  Procedural  countermeasures  include  security  practices 
put  in  place  to  improve  the  security  of  a  system.  For  example,  a  system  architect 
may  understand  that  a  device  is  vulnerable  to  various  SCA  attacks.  Knowing  this, 
the  architect  may  impose  a  restriction  on  the  number  of  times  a  unique  key  can  be 
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used  before  it  must  be  changed  so  that  if  a  particular  key  is  compromised,  it  will  not 


negatively  effect  the  security  of  future  transactions  And01,CJRR99 


Physical  design  countermeasures  can  be  sub-divided  into  two  primary  ap¬ 
proaches:  reduce  the  signal  and  increase  the  noise.  Both  approaches  degrade  the 
signal-to-noise  ratio  of  information  leakage  during  intermediate  computations  to  the 
point  where  it  becomes  impractical  (in  terms  of  number  of  required  observations  or 
pre-  and  post-processing  time)  to  extract  information  of  concern  from  the  physically 
observable  phenomenon.  The  actual  implementation  of  either  of  these  approaches 
can  take  place  at  various  levels  of  system  design  such  as  the  protocol,  system  archi¬ 
tecture,  operating  system,  software  (algorithm)  or  hardware  levels  |RO04a|. 


For  each  of  the  two  general  approaches,  a  large  number  of  specific  techniques 
have  been  proposed.  Most  counter  a  specific  side-channel  threat,  e.g.,  power  or 
timing  information  leakage  since  they  have  been  the  subject  of  the  majority  of  the 
SCA  research  to  date.  Notably,  no  perfect  countermeasure  exists  to  date,  and  most 
of  the  proposed  countermeasures  still  have  significant  SCA  vulnerabilities.  Some  of 
the  most  popular  and  promising  countermeasures  in  theory  have  been  shown  to  be 
inadequate  in  practice  soon  after  their  introduction.  Likewise,  some  widely  used 
countermeasures  such  as  masking  can  be  broken  almost  completely  in  the  context 
of  profiling  scenarios  where  the  adversary  has  access  and  full  control  of  a  separate 
training  device  |OM07|.  However,  in  a  well-designed  system  the  countermeasures 
may  be  sufficient  as  part  of  an  integrated  defense-in-depth  strategy  to  deter  less 
determined  adversaries — or  at  least  be  sufficient  to  protect  a  cryptographic  key  or 


secure  device  for  its  useful  life  CJRR99 


The  principle  SCA  countermeasures  that  have  been  proposed  to  date  are  de¬ 
scribed  in  Appendix  [A] 
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2.1  Practical  Considerations 


Numerous  publications  have  documented  the  nuances  of  implementing  side 
channel  attacks  in  practice,  with  the  book  by  Mangard  et  al.  being  the  most  thor¬ 


ough  treatment  MOP07  .  This  section  briefly  reviews  techniques  for  experimentally 
collecting  side  channel  data  and  pre-  and  post-processing  of  the  data  to  align  traces. 
Each  side  channel  attack  is  unique  and  dependent  on  the  specific  device  implemen¬ 
tation  being  targeted.  However,  the  approach  outlined  below  is  mostly  generic  and 
should  apply  to  most  situations. 


There  are  several  ways  side  channel  data  can  be  experimentally  collected.  The 
experimental  setup  used  in  this  research  to  collect  EM  data  is  described  in  Section 
|3.11.1[  The  most  common  technique  for  capturing  power  consumption  data  is  for 
an  attacker  or  security  evaluator  to  insert  a  small  resistor  (1-100  Ohms)  in  series 
with  either  the  primary  power  supply  line  or  the  ground  line,  and  sample  the  voltage 
difference  across  the  resistor  using  a  digital  oscilloscope  |MOP07|.  If  the  resistor  size 
is  too  small,  it  makes  measuring  the  voltage  drop  difficult,  and  if  it  is  too  large  it 
may  cause  the  circuit  to  malfunction.  The  time-varying  voltage  is  proportional  to 
the  current  drawn  from  the  power  supply,  and  to  the  power  consumed  by  the  circuit 
(thus  the  name  power  analysis).  The  procedure  is  semi- invasive  since  it  requires 
circuit  or  power  supply  modifications.  A  typical  power  analysis  setup  is  shown  in 

Fig.  [23. 


An  alternate  technique  preferred  by  some  is  to  use  a  contact-less  current  probe 
to  collect  the  power  consumption  data.  This  technique  is  less  invasive,  and  only 
requires  passing  the  power  supply  line  through  the  measurement  device.  However, 
Mangard  et  al.  state  that  such  a  setup  will  have  lower  sensitivity  than  the  former 
direct  measurement  technique. 


A  block  diagram  of  a  typical  experimental  setup,  representative  of  the  setups 
used  for  this  research,  is  shown  in  Fig.  |2.6[  The  steps  involved  in  this  experimental 
setup  are 


MOP07 
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RS232  interface 


Clock  signal 


Figure  2.5  A  typical  power  analysis  experimental  setup 


MOP07 


1.  Supply  the  cryptographic  device  with  power  and  a  clock  signal. 

2.  Configure  and  arm  the  oscilloscope  from  the  controlling  PC. 

3.  Command  the  target  cryptographic  device  to  begin  execution  and  trigger  data 
collection  by  the  oscilloscope. 

4.  Digitally  sample  the  voltage  variations  across  the  resistor. 

5.  Collect  the  output  of  the  cryptographic  operation. 

6.  Retrieve  the  recorded  side-channel  data. 


Figure  2.6  Block  diagram  of  typical  experimental  setup  for  side-channel  analysis 
(adapted  from  |MOP07|). 
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Even  if  the  target  system  has  its  own  source,  it  may  be  more  effective  to 
substitute  more  precise  components  to  provide  the  power  and  clock  signal.  The  data 
collection  technique  may  be  adapted  by  the  attacker  to  meet  the  unique  objectives  of 
a  particular  SCA  attack  against  the  system  of  interest — which  may  be  driven  by,  for 
example,  what  countermeasures  the  system  has  implemented  and  the  accessibility 
of  any  internal  circuitry.  Modern  chips  often  have  several  power  and  ground  pins 
as  well  as  complex  power  distribution  networks  with  filtering  capacitors  that  reduce 
the  effectiveness  of  power  analysis  attacks.  To  achieve  better  results,  it  may  be 
necessary  to  target  a  specific  pin  rather  than  attempting  to  perform  power  analysis 
on  the  noisy  global  power  source. 


Quisquater  and  Samyde  were  the  first  to  report  application  of  DPA-like  exper¬ 
iments  on  EM  emissions  |QS01|.  Their  initial  collection  were  taken  from  a  circuit 
using  a  small  (less  than  2  cm)  diameter  flat  coil  of  copper  wire  attached  to  a  digi¬ 
tal  oscilloscope  as  the  measuring  device.  The  original  experiments  were  conducted 
inside  a  Faraday  cage  to  minimize  received  noise  from  external  sources  of  electro¬ 
magnetic  energy.  However,  most  recent  EM  attacks  have  reported  minimal  noise  as 
a  result  of  conducting  EM  measurements  even  without  such  a  device.  The  work  of 
Quisquater  is  also  the  first  to  report  using  a  motorized  table  to  precisely  position  the 
EM  probe  over  the  device  under  test.  Using  this  technique,  they  demonstrated  the 
ability  to  create  a  three-dimensional  map  of  the  EM  leakage  (magnitude)  produced 
by  the  device.  Similar  capabilities,  including  the  Riscure  Inspector  system  used  for 
this  work  [Ris09|,  are  now  commercially  available. 


2.1. 0.6  Signal  Processing  Techniques.  In  practical  SCA  attacks,  sig¬ 
nal  alignment  and  countermeasures  can  make  straightforward  application  of  the 
DSCA  techniques  impossible  or  at  least  very  time  consuming.  In  some  cases,  the 
collected  signals  must  be  pre-  or  post-processed  to  make  an  attack  possible.  Sig¬ 
nal  (mis-) alignment  can  severely  impact  the  results  of  a  DSCA  attack,  and  several 
proposed  countermeasures  intentionally  induce  temporal  misalignment.  If  all  of  the 
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collected  side  channel  traces  do  not  begin  at  precisely  the  same  relative  time,  noise 
is  introduced  into  the  process,  which  requires  many  more  measurements  to  mount  a 
successful  attack.  In  experimental  setups,  initial  alignment  is  achieved  by  asserting 
a  signal  that  tells  the  digital  oscilloscope  to  begin  collecting  data  at  the  instant  the 
ciphering  operation  begins.  In  real-world  applications  of  SCA  techniques,  there  is 
obviously  no  trigger  signal  present.  Certain  commercial  systems  (e.g.,  the  icWaves 
subsystem  of  the  Riscure  Inspector  side  channel  analysis  system)  can  recognize,  in 
real-time,  the  signal  features  that  indicate  an  encryption  operation  of  interest  is  be¬ 
ginning  (Ris09|.  These  systems  obtain  traces  that  are  nominally  aligned  at  the  start 
of  the  encryption  operation.  Mangard  et  al.  discuss  a  number  of  techniques  that 
can  be  used  to  statically  align  trace  data  when  the  digital  sampling  does  not  begin 
at  the  precise  same  time  |MOP07  . 


Several  techniques  can  overcome  temporal  desynchronization  countermeasures. 
Mangard  et  al.  discuss  a  number  of  techniques  to  align  trace  data  |MOP07|.  Akkar  et 
al.  suggested  pre-processing  signals  where  temporal  desynchronization  is  present  and 
normalizing  the  individual  traces  by  stretching  or  compressing  the  samples  within 
each  clock  period  |ABDM00|.  After  pre-processing,  DSCA  techniques  are  applied 
as  normal.  One  technique  for  accomplishing  this  was  introduced  by  Woudenberg 


under  the  name  elastic  alignment  vWWBll  .  Some  investigators  have  also  indicated 


that  simply  pre-processing  the  captured  traces  to  convert  time-domain  signals  to 
the  frequency  domain  (via  Fast  Fourier  Transform)  and  performing  all  subsequent 
analysis  in  the  frequency  domain  will  significantly  reduce  any  impact  of  misalignment 


RO04b,  GHT05,  PHF09 


Clavier  introduced  an  attack  variation  that  overcomes  the  randomization  of 
operation  duration,  which  he  termed  sliding  window  DPA  (CCD00|.  The  attack 
works  by  conducting  a  DPA  attack  as  normal,  and  post-processing  the  resulting  sta¬ 
tistical  measure  (e.g.,  correlation  coefficients)  over  a  fixed  window  of  time,  effectively 
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restoring  the  amplitude  of  the  characteristic  peak  values  which  indicate  a  correct  key 
hypothesis. 

Numerous  other  applications  of  signal  processing  have  been  investigated.  Since 
SCA  is,  in  essence,  a  signal  detection/estimation/classification  problem,  it  is  likely 
that  well-known  signal  processing  techniques  from  other  fields  (communications  sys¬ 
tems,  biomedical  signal  processing,  etc.)  can  be  adapted  for  SCA. 

2. 8  Summary 

This  chapter  summarized  the  relevant  work  and  technical  background  necessary 
to  the  study  of  side  channel  analysis  and  information  leakage  from  integrated  circuits. 
This  information  supports  the  subsequent  material  presented  in  the  remainder  of  this 
document. 
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3.  Intrinsic  Physical  Layer  Authentication  of  Integrated  Circuits 


This  chapter  contains  the  text  of  an  article  that  was  submitted  and  accepted  for  pub¬ 
lication  to  the  Institute  of  Electrical  and  Electronic  Engineers  (IEEE)  Transactions 


on  Information  Forensics  and  Security  CLB+11  .  This  article  was  co-authored  by 
Mr.  Eric  Laspe,  Dr.  Rusty  Baldwin,  Dr.  Michael  Temple,  and  Dr.  Yong  Kim. 


3.1  Abstract 

RF  distinct  native  attribute  (RF-DNA)  fingerprinting  is  adapted  as  a  physical 
layer  technique  to  improve  the  security  of  integrated  circuit  (IC)-based  multi-factor 
authentication  systems.  Device  recognition  tasks  (both  identification  and  verifica¬ 
tion)  are  accomplished  by  passively  monitoring  and  exploiting  the  intrinsic  features 
of  an  IC’s  unintentional  RF  emissions  without  requiring  any  modification  to  the 
device  being  analyzed.  Device  discrimination  is  achieved  using  RF-DNA  finger¬ 
prints  comprised  of  higher-order  statistical  features  based  on  instantaneous  ampli¬ 
tude,  phase  and  frequency  responses  as  a  device  executes  a  sequence  of  operations. 
The  recognition  system  is  trained  using  Multiple  Discriminant  Analysis  to  reduce 
data  dimensionality  while  retaining  class  separability,  and  the  resultant  fingerprints 
are  classified  using  a  linear  Bayesian  classifier.  Demonstrated  identification  and  ver¬ 
ification  performance  includes  average  identification  accuracy  of  greater  than  99.5% 
and  equal  error  rates  of  less  than  0.05%  for  40  near-identical  devices.  Depending  on 
the  level  of  required  classification  accuracy,  RF-DNA  fingerprint  based  authentica¬ 
tion  is  well-suited  for  implementation  as  a  countermeasure  to  device  cloning,  and  is 
promising  for  use  in  a  wide  variety  of  related  security  problems. 


3.2  Introduction 

Physical  implementation  attacks  on  secure  electronic  systems  have  evolved 
rapidly  in  the  past  few  years  making  it  increasingly  difficult  for  new  countermea- 
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sures  and  security  practices  to  keep  pace  BMV05,  T.IOOj.  In  contrast  to  math¬ 
ematical  cryptanalytic  attacks  which  are  typically  hypothetical  in  nature,  imple¬ 
mentation  attacks  present  a  serious  and  immediate  threat  since  the  strength  of 
the  underlying  algorithms  and  protocols  is  rendered  largely  irrelevant.  Examples 
of  implementation  attacks  range  from  complex  techniques  requiring  expensive  and 
highly  specialized  equipment  (e.g.,  laser  fault  injection  or  focused  ion  beam  manip¬ 
ulation)  to  surprisingly  simple,  low-cost  attacks  targeting  the  unintentional  infor¬ 
mation  leakage  produced  by  devices  during  normal  operation  (e.g.,  simple  power 
analysis)  |BMV05,TJ09  . 


An  extensive  body  of  academic  and  commercial  research  has  been  dedicated  to 
examining  the  physical  security  of  cryptographic  and  other  secure  devices.  This  work 
has  emerged  in  the  last  decade  under  the  titles  side  channel  analysis  and  fault  analy¬ 
sis  (cf.  |BMV05 ,  AARR02 ,  KJ J99 ,  ZF05| ) .  Given  that  many  implementation  attacks 
are  well  within  the  reach  of  even  modestly  funded  and  minimally  equipped  individ¬ 
uals,  they  should  be  given  serious  practical  consideration  when  designing  modern 
systems.  A  prudent  design  approach  is  to  1)  assume  that  secure  tokens  or  other 
essential  system  components  are  subject  to  counterfeiting,  cloning,  or  sensitive  data 
extraction,  and  2)  take  appropriate  steps  to  mitigate  the  associated  risks  as  part  of 
an  integrated,  multi-tiered  system  security  architecture. 

RF  “distinct  native  attribute”  (RF-DNA)  fingerprinting  (cf.  |SITMM08,KTM09 


RTM10ptPTllpIBK06[|WMTM10]|WTRld|CGB+10|)  is  adopted  herein  as  a  way  to 


augment  existing  multi-factor  authentication  schemes  via  physical  layer  authentica¬ 
tion  at  the  device  level  to  counter  cloning  and  related  threats.  The  term  RF-DNA  is 
used  to  embody  the  coloration  of  RF  emissions  (both  intentional  and  unintentional) 
induced  by  the  intrinsic  physical  attributes  of  a  unique  device.  Only  RF  emissions 
produced  by  unintentional  emitters  such  as  integrated  circuits  (ICs)  are  considered 
in  this  study. 
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Using  the  RF-DNA  approach,  semiconductor-based  IC  devices  are  passively 
recognized  based  on  discriminating  features  ( RF-DNA  fingerprints )  extracted  from 
their  intrinsic  physical  properties  in  a  manner  analogous  to  biometric  human  iden¬ 
tification.  Because  this  technique  exploits  emissions  caused  by  intrinsic  inter-device 
variability,  it  is  suitable  for  a  variety  of  security  applications  involving  commodity 
commercial  ICs,  and  does  not  require  any  physical  device  modifications.  Moreover, 
our  initial  results  indicate  the  technique  can  be  adapted  to  work  with  existing  pro¬ 
cesses  and  protocols,  and  is  likely  suitable  for  use  with  a  wide  variety  of  IC  devices, 
e.g.,  general  purpose  microcontrollers,  programmable  logic  devices  such  as  FPGAs, 
and  custom  ASICs.  To  our  knowledge,  the  work  presented  here  and  in  [CGB+10 
is  the  first  to  propose  using  the  intrinsic  DNA  of  unintentional  emissions  for  IC 
recognition. 


This  work  makes  a  number  of  distinct  contributions  while  expanding  on  the 
initial  proof-of-concept  results  in  (CGB+10|  with  an  extensive  empirical  evaluation. 
In  particular,  previous  RF-DNA  work  has  predominantly  considered  device  identi¬ 
fication  tasks.  However,  the  primary  use  case  envisioned  for  IC  fingerprinting  is  to 
counter  cloning  and  related  threats,  which  requires  identity  verification.  Herein,  a 
systematic  approach  is  developed  and  introduced  to  evaluate  RF-DNA  fingerprinting 
effectiveness  in  the  context  of  both  identification  and  verification  tasks  for  arbitrary 
n-class  problems.  Performance  of  the  proposed  technique  is  evaluated  under  a  wide 
range  of  simulated  noise  conditions,  and  empirical  results  are  presented  indicating 
the  RF-DNA  technique  performance  scales  well  for  identification  and  verification 
tasks  involving  40  near-identical  devices. 


3.3  Problem  Definition 

This  work  assesses  the  suitability  of  RF-DNA  fingerprinting  for  two  distinct, 
but  closely  related  device  recognition  tasks:  identification  and  verification.  These 
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tasks  are  analogous  to  human  recognition  tasks  for  which  biometric-based  pattern 


recognition  systems  are  frequently  used  JRP04 


1.  Device  identification.  The  recognition  system  determines  a  device’s  identity  by 
comparing  a  captured  device  fingerprint  with  reference  fingerprint  templates 
for  all  known  devices.  Identification  requires  a  one-to-many  comparison  and  is 
considered  more  difficult  than  verification. 

2.  Device  verification.  The  recognition  system  checks  the  authenticity  of  a  de¬ 
vice’s  claimed  identity  (by  virtue  digital  credentials  presented)  using  a  one-to- 
one  comparison.  As  with  biometric  verification,  the  objective  of  physical  layer 
device  verification  is  to  prevent  two  devices  from  using  the  same  identity. 

Previous  RF-DNA  fingerprinting  work  predominantly  focused  on  the  one-to- 
many  identification  task  in  the  context  of  wireless  network  security,  where  a  device 
entering  a  network  needs  to  be  verified  as  belonging  to  a  pool  of  authorized  devices. 
However,  detection  of  cloned  security  tokens  such  as  smart-card  based  identification 
cards  or  payment  devices,  requires  one-to-one  verification  that  the  claimed  identity 
of  the  device  matches  its  physical  fingerprint.  Herein,  the  suitability  of  RF-DNA 
fingerprints  for  use  by  physical  layer  device  recognition  systems  is  assessed  for  both 
identification  and  verification  tasks. 


3-4  Notional  Physical  Layer  Device  Authentication  System  Design 

This  section  describes  a  system  design  for  applying  RF-DNA  fingerprinting  to 
the  device  identification  and  verification  problems  described  above.  The  basic  design 


is  modeled  after  a  typical  biometric  system  JRP04  ,  and  includes  five  modules: 


•  Sensor  module.  The  sensor  module  captures  unintentional  RF  emissions,  and 
is  composed  of  an  RF  receiver  and  a  near-field  probe.  Details  of  the  particular 
sensor  module  used  to  obtain  the  experimental  data  are  described  in  Sec.  |3.8.2| 
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•  Feature  extraction  module.  Features  are  generated  based  on  the  statistical 
behavior  of  one  or  more  instantaneous  response  (s)  within  pre-defined  fixed 
signal  regions.  Both  the  device  enrollment  and  the  feature  matching  processes 
use  the  feature  extraction  module  to  generate  a  statistical  fingerprint  for  each 


captured  signal  as  described  in  Sec.  3.7 


•  Classifier  training  module.  Extracted  features  are  post-processed  using  dimen¬ 
sionality  reduction  and  probability  density  estimation  techniques  to  generate 
a  reference  template  for  each  enrolled  device. 


•  System  database  module.  As  each  device  is  enrolled,  the  set  of  training  finger¬ 
prints  and  associated  reference  template  are  stored  in  a  verification  database. 
As  each  device  is  issued  or  associated  with  a  particular  digital  identity,  the 
database  is  updated  to  reflect  the  pairing  (e.g.,  device  A1  belongs  to  John 
Smith). 

•  Classification  /  feature  matching  module.  For  recognition  tasks,  one  or  more 
fingerprints  are  extracted  from  the  presented  device.  The  extracted  finger¬ 
print  (s)  are  compared  to  the  stored  reference  templates  in  the  system  database 
to  either  identify  the  device  or  verify  its  presented  identity. 


The  basic  steps  involved  in  fingerprinting  each  device  at  enrollment  are: 


1.  Command  the  device  to  execute  a  short  pre-defined  sequence  of  operations  (the 
challenge). 

2.  Capture  the  unintentional  near-field  RF  emissions  produced  by  the  device  as 
it  executes  the  operation  sequence  (the  response). 

3.  Extract  discriminating  features  from  the  captured  emissions  to  produce  an 
RF-DNA  fingerprint. 

4.  Repeat  the  above  steps  AAp  times  to  obtain  a  set  of  training  fingerprints. 

5.  Process  the  set  of  extracted  training  fingerprints  to  generate  a  reference  tem¬ 
plate  for  each  enrolled  device. 
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6.  Repeat  the  above  steps  for  each  of  Ncr  challenge-response  sequences. 

7.  Store  each  set  of  training  fingerprints  and  generated  reference  template  in  a 
database. 

8.  When  the  physical  device  is  issued  to  an  individual  or  paired  with  a  set  of  digital 
credentials,  update  the  database  to  associate  the  stored  reference  template  with 
those  digital  credentials. 

In  the  context  of  a  cryptographic  challenge-response  system,  the  command  to 
execute  a  particular  operation  sequence  is  the  challenge  and  the  unintentional  RF 
emissions  produced  by  the  device  while  executing  the  sequence  is  the  response. 

The  operation  sequence  used  can  be  composed  of  any  fixed,  repeatable  process. 
For  general  purpose  microcontrollers,  the  operation  sequence  would  be  a  series  of 
microcode  instructions  (e.g.,  some  combination  of  loop,  branch,  control,  or  arithmetic 
commands)  on  known  (fixed)  data.  For  programmable  logic  devices  or  ASICs,  the 
sequence  could  be  composed  of  several  clock  cycles  where  different  combinational 
logic  paths  are  activated,  starting  from  a  known  fixed  configuration.  When  possible, 
device  configuration  (clock  rate,  on-chip  peripheral  status,  etc.)  should  be  held  or 
reset  to  a  known  state  prior  to  beginning  the  operation  sequence. 

After  enrollment,  subsequent  device  recognition  tasks  are  performed  by  re¬ 
peating  the  challenge- response  protocol  to  obtain  an  authentication  fingerprint.  The 
authentication  fingerprint  is  processed  by  the  classification  /  feature  matching  mod¬ 
ule  for  identification  or  verification.  Although  the  experiments  conducted  herein  are 
limited  to  a  single  challenge-response  sequence  (Acr  =  1),  extension  of  the  approach 
to  an  arbitrary  number  of  challenge-response  pairs  is  straightforward. 

An  explicit  RF-DNA  challenge-response  procedure  is  unnecessary  for  many  en¬ 
visioned  applications  because  the  RF-DNA  fingerprinting  procedure  can  be  be  piggy¬ 
backed  on  existing  protocols.  For  example,  it  is  typical  to  initiate  communications 
with  smart-cards  by  issuing  a  reset  command  to  the  card  and  parsing  the  resulting 
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answer  to  reset  (ATR)  response.  A  straightforward  RF-DNA  implementation  might 
generate  a  fingerprint  from  a  small  portion  of  the  ATR  response.  Other  opportunis¬ 
tic  responses  for  fingerprinting  of  smart-card  authentication  include  PIN  verification 
or  credential  retrieval.  Similar  opportunistic  approaches  can  be  envisoned  for  identi¬ 
fication  and  verification  tasks  based  on  microcontrollers,  programmable  logic  devices 
such  as  FPGAs,  and  ASIC-based  devices. 


3.5  Unintentional  RF  Emissions  of  ICs 

It  is  common  knowledge  that  electronic  equipment  radiates  electromagnetic 
(EM)  energy  that  can  interfere  with  nearby  devices.  It  is  for  this  reason  that  airline 
passengers  are  required  to  “turn  off  all  portable  electronic  devices,”  and  consumer 
electronics  undergo  certification  testing  for  compliance  with  Federal  Communications 
Commission  (FCC)  [FCC09  or  other  regulating  standards.  Digital  devices  that 
incorporate  high  frequency  clocks,  oscillators,  etc.  are  specifically  regulated  as  known 
unintentional  emitters  and  require  strict  testing  to  ensure  emissions  do  not  exceed 
tolerable  levels. 


Variations  in  current  flow  through  a  device  due  to  clock  distribution,  transistor 
switching,  and  other  IC  component  activity  produce  EM  fields  that  combine  through 
complex  interactions  and  propagate  via  both  radiation  and  conduction  in  the  form  of 
time-varying  EM  waves.  The  fundamental  nature  of  these  effects  is  well  understood 
and  is  described  by  Maxwell’s  equations  |AARR02  . 


In  the  past  decade  there  has  been  a  growing  realization  that  unintentional 
emissions  are  not  only  a  source  of  interference,  but  also  a  useful  source  of  information 


about  the  internal  state  of  the  emission  producing  device  KJJ99,  AARR02,BMV05 


ZF05,  PEK+09|.  This  has  had  profound  implications  for  the  physical  security  of 


sensitive  electronic  systems  since  in  many  instances  the  leaked  state  information  is 
sufficient  to  infer  precise  details  about  the  operations  the  device  is  performing  and/or 
the  data  it  is  processing.  More  recently,  it  was  shown  in  CGB+10  that  in  addition 
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to  data  and  operation-dependent  characteristics,  the  unintentional  near-field  RF 
emissions  of  individual  ICs  also  exhibit  significant  device-dependent  characteristics. 

The  most  likely  source  of  inter-device  emission  variability  is  the  random  process 


variations  introduced  during  die  fabrication  and  packaging  VerlO  .  Although  IC 
fabrication  processes  are  necessarily  precise,  structural  variations  are  still  introduced 
in  the  final  device  structure  on  a  very  small  scale  (deep  sub-micron  in  modern  IC 
technology).  As  a  result,  no  two  chips  are  exactly  alike.  As  long  as  process-induced 
variations  are  within  acceptable  tolerances,  the  device  will  operate  correctly  from  a 
black-box  functional  perspective. 


For  this  study,  the  hypothesis  is  that  the  fabrication  process-induced  variations 
in  each  individual  chip’s  electrical  properties  color  the  unintentional  RF  emissions 
from  the  circuit,  and  that  the  resultant  coloration  is  sufficient  to  uniquely  identify 
the  emissions’  source.  Although  only  unintentional  RF  emissions  are  studied  herein, 
the  approach  used  is  believed  to  be  applicable  to  other  side  channel  emissions  such 
as  variations  in  the  power  consumption  of  the  device. 


3. 6  Related  Work 


Various  methods  have  been  proposed  to  use  the  uniqueness  of  inter-device  pro¬ 
cess  variations  to  enhance  security.  Previously  proposed  physical  layer  device  recog¬ 
nition  techniques  include  physical  unclonable  functions  (PUFs)  jVerlO,  PRTG02|, 
RF  certificates  of  authenticity  (RF-COAs)  |DK07|,  and  the  exploitation  (i.e,  RF 
fingerprinting)  of  unique  signal  coloration  within  intentional  emissions  produced  by 
wireless  networking  |SITMM08  >  KTM09 ,  RTM10 ,  HBK06  >  RPT1 1 ,  DC09  and  RFID- 


based  devices  DHBC09  . 


3.6.1  Physical  Unclonable  Functions  (PUF).  PUF  techniques  refer  to  two 
distinct  approaches  for  device  authentication.  The  first  augments  an  IC  with  special¬ 
ized  internal  measurement  circuitry  that  computes  a  one-way  function  from  glitch 
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counts,  propagation  delays,  or  other  electrical  properties  that  vary  randomly  with 
intrinsic  process  variations  of  the  IC  [VerlO,  MKP09|.  A  second  approach  combines 
a  grid  of  capacitive  sensors  integrated  into  the  top  metal  layer  of  the  IC  with  a  con¬ 
formal  coating  doped  with  randomly  distributed  dielectric  particles  applied  on  top  of 
the  IC’s  passivation  layer.  The  conformal  coating  requires  active  interrogation  (i.e., 
application  of  a  specified  voltage  with  known  amplitude  and  frequency)  and  internal 


measurement  of  the  response  by  the  circuit  VerlO 


3.6.2  RF  Certificates  of  Authenticity  (RF-COA).  The  RF-COA  technique 
attaches  a  three  dimensional  constellation  of  small  randomly  shaped  conductive  or 
dielectric  objects  to  an  RFID  device.  This  is  similar  to  a  PUF  coating  except  that 
both  the  interrogation  and  response  measurement  are  carried  out  by  an  external 
RFID  reader.  The  RFID  reader  is  modified  to  include  a  dense  matrix  of  patch 
antennae  to  transmit  and  receive  high-frequency  RF  signals.  The  reader  interrogates 
a  modified  RFID  object  and  extracts  a  fingerprint  to  compute  a  COA  |DK07|. 


3.6.3  RF-DNA  Fingerprinting.  RF  fingerprinting  has  been  proposed  as  a 
physical  layer  technique  to  enhance  the  security  of  various  wireless  communications 
devices  (e.g.,  RFID  |DHBC09j,  802.11  WiFi  [RPTlj],  802.15  WPAN  |HBK06,DC09 


802.16  WiMAX  |WMTM10|,  GSM  |WTR10|.  The  device  fingerprinting  methodol¬ 


ogy  developed  herein  is  specifically  based  on  previous  RF-DNA  work  in  SITMM08 


KTM09, RTM10|.  The  preliminary  results  in  |CGB+10  confirmed  that  the  general 


approach  was  promising  for  recognition  of  ICs  based  on  their  unintentional  emissions. 


A  significant  advantage  of  RF-DNA  fingerprinting  compared  to  PUF  and  RF- 
COA  techniques  is  its  applicability  to  the  authentication  of  any  commodity  IC  with¬ 
out  modifications  to  the  internal  circuitry  or  application  of  an  external  coating. 
Additionally,  measurement  of  the  device  response  is  passive  and  does  not  require  a 
transmitter. 
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Fingerprint  Instantaneous  Signal  Statistical  Feature 

Regions  Response  Generation  Generation 
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Figure  3.1  RF-DNA  statistical  fingerprint  generation  process. 


To  implement  RF-DNA  fingerprinting,  the  sensor  module  described  in  Sec.  3 A 
would  be  integrated  into  the  terminal  or  device  reader.  This  approach  is  believed  to 
be  reasonable  given  that  space  and  power  constraints  are  generally  less  restrictive 
in  a  reader  (e.g.,  smart-card  reader  or  ATM  machine)  than  in  the  secure  token  or 
device  itself. 


3.7  RF-DNA  Fingerprint  Generation  and  Classification 

3. 7. 1  RF-DNA  Feature  Extraction  and  Statistical  Fingerprint  Generation. 
The  statistical  fingerprint  generation  methodology  used  herein  is  based  on  |SITMM08 
with  modifications  made  to  1)  extend  the  process  from  the  limited  3-class  to  general 
A-class  problems,  and  2)  to  enable  both  identification  and  verification  device  recog¬ 
nition  tasks.  RF-DNA  statistical  fingerprint  feature  vectors,  F,  are  extracted  from 
real-valued  time-domain  samples  based  on  the  statistical  behavior  of  instantaneous 
signal  response (s)  within  pre-selected  response  regions.  The  complete  fingerprint 
extraction  and  generation  process  is  shown  in  Fig.  |3.1| 
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Three  instantaneous  signal  responses  are  generated  from  the  real-valued  time 
domain  samples:  instantaneous  amplitude  (IA)  given  by  a(n),  instantaneous  phase 
(IP)  given  by  0(n),  and  instantaneous  frequency  (IF)  given  by  f(n).  To  calculate 
0  (n)  and  f  (n),  the  real- valued  signal  samples  are  first  converted  to  I-Q  samples, 


sc(n)  —  s/(n)+sQ(n);  using  a  Hilbert  transform  Lyo04  .  The  IP  samples  are  calculated 
using 


0  (n)  =  tan  1 

with  the  corresponding  IF  (Hz)  given  by 


sQ(n) 

Sl(n) 


(3.1) 


d(j)  (■ n ) 
dt 


(3.2) 


The  resultant  IA  and  IF  responses  are  “centered”  using  the  amplitude  (pa) 
and  frequency  (p/)  means  to  remove  potential  collection  biases 


ac  (n)  =  a(n)  -  pa, 


(3.3) 


Sc  (n)  =/(»)-  Ht- 


(3.4) 


Finally,  the  centered  responses  in  (3.3)  and  (3.4)  are  normalized  by  their  re¬ 


spective  maximum  magnitudes  to  compensate  for  power  variation. 


3.7. 1.1  Statistical  Fingerprint  Generation.  After  centering  and  nor¬ 
malization,  Nf  =  4  statistical  features  are  generated  within  each  selected  region  of 
each  instantaneous  response:  standard  deviation  (<r),  variance  (a2),  skewness  (7) 
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and  kurtosis  (k)  |SITMM08].  For  an  arbitrary  centered  and  normalized  sequence 
{xc  (n)}  having  Nx  samples,  these  features  are 


1  ^ 


x  n=  1 


(3-5) 


1  Nx 

7  =  T7  3  XI  (n)  “  and 

1  '  nr.  & 


(3.6) 


71=1 


ft  = 


A7cr 


where  standard  deviation  (cr)  is  \[crL . 


,  nx 

—  X  (Xc  (n) 


(3.7) 


71=1 


For  all  results  presented  in  Sec.  |4.6[  each  statistic  was  calculated  over  NR  =  32 
equal  length,  contiguous  sub-regions  spanning  a  total  region  of  interest  (ROI).  The 
ROI  was  empirically  selected  from  the  collected  signal  and  contains  Ncl  =  32  clock 
cycles  of  device  operations.  Extensive  pilot  studies  confirmed  that  for  the  parameter 
combinations  studied,  partitioning  samples  into  sub- regions  corresponding  to  integer 
multiples  of  the  number  of  clock  cycles  in  the  ROI  yielded  statistically  superior  results 
relative  to  partitioning  based  on  fractional  clock  cycles.  The  full  ROI  encompassing 
all  Ncl  =  32  clock  cycles  was  used  as  an  additional  “total”  region  giving  (NR  +  1)  = 


33  total  regional  contributions  for  each  device.  Fig.  T2  illustrates  the  sub-region 
allocation  process  used  herein. 

For  each  subregion  and  the  “total”  region,  the  four  statistics  are  concatenated 
to  form  a  regional  RF-DNA  marker  vector 


Frx  -  [<7rx  <J2r.  r)Ri  krx]1x4  , 


(3.8) 
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Figure  3.2  Mean  amplitude  response  of  signals  collected  for  all  devices  from  part 
numbers  A  and  B  (see  Tab.  3.2|).  The  x-axis  shows  sub-region  al¬ 
location  using  Nr  =  (Nqlk  +  1)  =  33  sub-regions  for  calculation  of 


RF-DNA  statistical  fingerprints  as  described  in  Sec.  3.7.1.1 


where  i  G  {1,2,...,  (Nr  +  1)}.  The  RF-DNA  marker  vectors  (3.8)  are  concatenated 
to  form  a  composite  characteristic  vector  for  each  selected  characteristic 


Fc  = 


Fr1  :  Fn,,  :  Fr.,  . . .  Ft 


Ri 


Rs 


rnr+ 1 


J  lx[4(JVR+l)] 


(3.9) 


where  the  superscripted  C  denotes  a  specific  characteristic  response,  i.e.,  a,  0,  or  /. 
Considering  IA,  IP,  and  IF  the  final  statistical  fingerprint  for  each  signal  is  a  vector 
of  Nf  ■  (Nr  +  1)  •  Ni  =  4  •  33  •  3  =  396  total  elements,  or 


F 


Fa  ;  F^  :  F^ 


1x396 


(3.10) 
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Figure  3.3  Average  of  500  RF-DNA  fingerprints  for  each  of  the  40  tested  devices. 

Fingerprints  are  composed  of  statistical  features  extracted  from  Nql  — 
32  clock  cycles  (at  collected  SNR).  The  .x-axis  is  the  alpha-numeric 
designator  for  each  chip  (see  Table  [3~2|) . 

Finally,  the  training  matrix  composed  of  App  separately  collected  statistical 
fingerprints  is 


Ft  — 


Fi 

F2 


(3.11) 


Fyvpp 


Npp  x  396 


which  is  input  to  the  classification  training  process  described  in  Sec. 


3.7.2 


The  RF-DNA  fingerprints  shown  in  Fig.  m  illustrate  intra-  and  inter- part 
number  variability  in  the  statistical  features  generated  for  40  unique  devices.  For 
visual  clarity,  the  RF-DNA  markers  in  F  are  scaled,  compressed  and/or  expanded, 
and  quantized  such  that  the  plotted  data  spans  the  interval  [0, 1]  within  each  statistic. 
The  quantized  markers  are  stacked  vertically  to  create  an  electrophoresis- like  plot. 


The  RF-DNA  plot  is  helpful  in  developing  an  intuitive  understanding  of  which 
statistical  features  exhibit  the  most  variance  both  within  and  across  part  numbers.  It 
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is  expected  that  the  classifier  will  have  the  greatest  difficulty  distinguishing  between 
parts  that  have  visually  similar  RF-DNA  fingerprints.  For  example,  the  RF-DNA 
plots  for  devices  A4,  A6,  and  A 8  appear  to  be  very  similar  across  all  features.  These 
devices  are  expected  to  be  confused  more  frequently  than  devices  that  look  substan¬ 
tially  different  across  one  or  more  statistical  features.  Device  A9,  on  the  other  hand, 
appears  quite  unique  and  would  generally  be  expected  to  be  less  confused  with  other 
devices.  Likewise,  the  large  apparent  differences  between  the  Class  A  chips  and  the 
chips  in  all  other  classes  implies  that  a  classifier  should  be  able  to  distinguish  Class 
A  chips  from  the  other  classes  more  easily. 


3.7.2  Classifier  Training.  Consistent  with  previous  RF-DNA  fingerprinting 
work,  training  of  the  classification  system  is  accomplished  using  multiple  discrimi¬ 
nant  analysis  (MDA)  to  reduce  feature  dimensionality  and  improve  class  separability. 
MDA  is  an  extension  of  Fisher’s  (two-class)  linear  discriminant  analysis  (LDA)  to 
A-classes  that  linearly  transforms  the  sample  points  into  an  ( N  —  l)-dimensional 
subspace  without  reducing  the  class  separability  power  [TK09|.  The  MDA  projec¬ 
tion  maximizes  the  ratio  between  inter-class  distance  and  intra-class  variance.  For 
all  results  presented  herein,  input  fingerprint  data  is  projected  from  the  original 
396-dimensional  data  into  a  compressed  (Nr  —  1)  =  39-dimensional  space. 

Given  input  data  matrix  X,  the  MDA  transformation  first  finds  the  within 
(intra-)  (§„,)  and  between  (inter-)  (§*,)  class  scatter  matrices 

nd 

S  w  =  Y/PiEi,  (3.12) 

2=1 


TK09 


Nd 

E>b  =  ^2  Pi  (im  -  Ho)  (Hi  -  LIo)T  ■  (3.13) 

2=1 

where  E*  is  the  covariance  matrix  for  class  Ct,  and  Pi  is  the  prior  probability  of  class 
C\  (assumed  to  be  equal  for  all  classes). 
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Projection  matrix  W  is  formed  from  the  (N  —  1)  eigenvectors  of  S“ 1  .  The 

formation  of  W  optimally  maximizes  the  ratio  of  inter-class  distance  to  intra-class 
variance 


TK09  .  Individual  fingerprints  are  projected  onto  the  (N  —  l)-dimensional 


MDA  space  by 


fw  =  wtF.  (3.14) 

For  all  results  presented  herein,  the  full  MDA-projected  fingerprint  training 
matrix,  is  a  combination  of  Np p  MDA-projected  training  fingerprints,  each  with 
(Nd  —  1)  =39  elements 


(3.15) 


Njp  pX  39 


Classifier  training  is  completed  by  fitting  a  multivariate  normal  distribution  to 
the  MDA-projected  data  and  saving  the  estimated  distribution  parameters  (mean 
vector,  fiY,  and  covariance  matrix,  S)v)  for  each  class.  For  the  linear  Bayesian 
classifier  used  herein,  a  pooled  estimate  of  the  covariance  matrix  is  used  in  lieu  of 
individual  covariance  matrices  for  each  class  |  TKO0 .  Note  these  parameters  are  for 
the  distribution  fit  to  the  training  data  after  MDA  projection,  as  indicated  by  the 
superscripted  W. 


The  complete  training  process  produces  the  following: 


•  MDA  projection  matrix  (W) 

•  Njj  sets  of  MDA-projected  training  fingerprints  (F77)  (one  set  for  each  device) 

•  Nn  estimated  mean  vectors  (ufr) 

•  A  pooled  estimate  of  the  covariance  matrix  (Ew) 
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The  pair  of  parameters  composed  of  the  mean  vector  and  covariance  matrix 


for  each  device  is  the  reference  template  for  that  device. 


3.7.3  Fingerprint  Classification.  As  described  in  Sec.  3.3,  RF-DNA  statis¬ 


tical  fingerprints  can  be  used  for  both  identification  and  verification  tasks.  For  both 
applications,  the  unintentional  RF  emissions  of  a  target  device  are  collected  and  the 
relevant  statistical  features  are  extracted  and  projected  into  the  MDA  space  using 


(3.14)  to  generate  a  projected  device  fingerprint  (Fw).  The  classification  technique 
and  performance  measures  used  to  evaluate  system  performance  are  different  for 
each  application  and  are  described  below. 


3.7.3. 1  Device  Identification.  For  device  identification ,  a  captured 
fingerprint  is  compared  to  the  reference  templates  of  all  devices  in  a  verification 
database  (one-to-many  comparison)  using  some  similarity  criterion.  The  identity 
of  the  target  device  is  determined  by  computing  the  similarity  score  for  each  com¬ 
parison  and  selecting  the  best  match.  The  similarity  measure  used  herein  is  the 
Bayesian  posterior  probability  under  assumptions  of  equal  prior  probabilities  and 
equal  costs.  This  approach  is  optimal  for  the  minimization  of  classification  error 
probability  |TK09  . 

Stated  formally,  for  the  case  with  ND  devices,  a  device  fingerprint  Fw  is  as¬ 
signed  to  class  u>i,  where  *€{1,2,...,  ND}  if 


P  p.  I  Fw)  >  P  pj  |  Fw)  Vj^t, 


(3.16) 


where  P  (u>i  |  Fw)  is  the  conditional  posterior  probability  that  Fw  belongs  to  class 
.  According  to  Bayes’  rule,  the  conditional  probability  is  |Mac03 


P  (fi>i  |  Fw)  = 


P  (Fw  |  cui)  P  (u;*) 
P  (Fw) 


(3.17) 
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For  a  given  fingerprint  Fw  the  denominator  is  constant  across  all  u>i  and  can  be 


neglected  when  evaluating  the  relative  probabilities  in  (3.16).  Assuming  equal  prior 


probabilities  for  all  classes,  P(u!i)  =  1/Nd  is  likewise  constant,  and  the  decision 


criteria  in  (3.16)  reduces  to  maximizing  the  likelihood  P  (Fw  |  uj/)  for  all  a;*.  The 
likelihood  is  estimated  from  the  multi- variate  Gaussian  distribution  defined  by  each 
device  reference  template  [TK09 


P  (Fw  |  u>i)  = 


(2tt) 


(JVd-1)/2 


1/2 


exp  (Te), 


(3.18) 


where 


Pe  =  (Fw  -  &)T  S  1  (Fw  -  fc)  .  (3.19) 

System  identification  performance  is  evaluated  based  on  the  overall  correct 
identification  percentage,  calculated  as  the  percentage  of  time  the  classifier  correctly 
identifies  the  true  class  of  an  observed  fingerprint  over  all  trials.  Confusion  matrices 
are  generated  to  facilitate  analysis  of  identification  errors,  but  are  not  presented  in 
the  interest  of  space. 

3. 7.3.2  Device  Verification.  For  device  verification,  the  captured 
fingerprint  need  only  be  compared  with  the  specific  template  corresponding  to  the 
device’s  claimed  identity  to  determine  authenticity.  Again,  the  similarity  measure 
used  is  the  Bayesian  posterior  probability  assuming  equal  priors  and  equal  costs.  The 
decision  for  device  verification  is  a  binary  result,  where  the  device  is  deemed  authentic 
when  the  posterior  probability  exceeds  a  pre-determined  threshold;  otherwise,  it  is 
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classified  as  an  impostor.  Stated  formally,  a  newly  observed  device  fingerprint  F 
is  considered  authentic  when 


P  (u>c  |  Fw)  >t, 


(3.20) 


where  ujc  is  the  class  to  which  the  device  claims  to  belong,  and  t  is  the  decision 
threshold. 


For  each  verification  decision  the  system  can  make  two  types  of  errors  (see 


Table  3.1)  JRP04 


1.  incorrectly  accepting  an  impostor  as  authentic  ( false  accept) 

2.  incorrectly  rejecting  an  authentic  device  as  an  impostor  ( false  reject ) 

The  threshold,  t,  used  for  accept/reject  decisions  can  be  adjusted  to  tune 
system  performance  to  increase  security  (reduce  false  accept  errors)  or  increase  ac¬ 
cessibility  (reduce  false  reject  errors). 

Verification  performance  is  assessed  herein  by  plotting  the  receiver  operating 
characteristic  (ROC)  curve  and  corresponding  equal  error  rate  (EER)  for  each  curve. 
The  ROC  curve  is  generated  by  plotting  the  true  accept  rate  (TAR)  vs.  false  accept 
rate  (FAR)  as  t  is  varied  across  the  interval  [0, 1]  |  JRP04) .  EER  corresponds  to  the 
point  on  the  ROC  curve  where  the  false  reject  rate  (FRR  =  1  —  TAR)  and  FAR  are 
equal,  and  is  frequently  used  as  a  summary  statistic  to  compare  the  performance  of 
various  classification  systems.  In  general,  lower  EERs  indicate  better  system  clas¬ 
sification  performance.  The  EER  achieved  for  each  plotted  ROC  curve  is  provided 
herein  to  facilitate  future  comparisons. 
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Table  3.1 

Verification  Error  Types. 

System  Decision 

Reality 

Authentic 

Imposter 

Authentic 

Imposter 

True  Accept 
False  Accept 

False  Reject 
True  Reject 

3.8  Experimental  Methodology 

All  results  were  obtained  by  analyzing  features  of  empirically  collected  unin¬ 
tentional  RF  emissions  from  a  given  number  of  tested  devices.  A  description  of  the 
experimental  setup  and  analysis  methodology  used  herein  is  provided  below. 


3.8.1  Experimental  Setup.  The  unintentional  RF  emissions  of  a  total  of 
Nd  =  40  individual  16-bit  PIC24F  micro-controllers  are  evaluated  |InclO|.  These 
devices  include  ten  unique  chips  from  each  of  four  different  part  numbers  as  shown  in 
Table  [372}  The  part  numbers  were  intentionally  selected  to  provide  varying  degrees 
of  similarity  in  device  architecture.  All  ten  chips  within  each  part  number  are  from 
the  same  manufacturing  lot. 

One  of  the  PIC  parts  (Part  A)  shares  the  same  basic  architecture  as  the  others 
but  has  several  on-chip  peripherals  not  present  on  the  other  three  parts.  The  other 
three  PIC  parts  (B,  C,  and  D)  have  minimal  architectural  differences  and  are  identi¬ 
cal  except  for  the  amount  of  on-board  flash  RAM  (64,  48,  and  32  kbit  respectively). 
Hereafter,  individual  chips  are  referenced  by  an  alphanumeric  designator  correspond¬ 
ing  to  their  respective  part  number  and  unique  chip  number,  i.e.,  A 1,  A2, ...,  D9,  D 10 
as  in  Table  13721 


The  PIC  devices  are  representative  of  the  low  cost  micro-controllers  used  in 
a  variety  of  real-world  commercial  embedded  systems,  including  various  security 


applications  PEK+09  and  are  easy  to  obtain  through  normal  commercial  channels. 
All  of  the  chips  were  fabricated  using  an  unspecified  180  nm  process.  Since  all  chips 
within  a  part  number  are  from  the  same  manufacturing  lot,  layout  and  architectural 
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Table  3.2  Tested  PIC  micro-controller  device  classes. 


Part 

Class 

Device  Numbers 

PIC  Part  Number 

A 

A1-A10 

PIC24FJ64GA102  I/SP 

B 

B1-B10 

PIC24FJ64GA002  I/SP 

C 

C1-C10 

PIC24FJ48GA002  I/SP 

D 

D1-D10 

PIC24FJ32GA002  I/SP 

features  are  identical;  the  only  anticipated  physical  differences  between  chips  with 
the  same  part  number  are  those  resulting  from  random  (uncontrolled)  variations  in 
the  die  fabrication  and  packaging  processes. 


For  device  control  and  measurement,  the  micro-controllers  were  mounted  on 
a  single  evaluation  board  and  programmed  to  respond  to  commands  sent  over  an 
RS-232  serial  interface.  The  circuit  board  was  fixed  in  place  on  a  measurement  table 
using  a  custom  jig  to  minimize  any  lateral  movement  during  or  between  collections, 
and  was  powered  from  a  standard  lab  DC  power  supply  (Agilent  E3631A)  to  reduce 
effects  due  to  uncontrolled  supply  voltage  fluctuations.  A  detailed  description  of  the 


experimental  setup  used  can  be  found  in  CGB+10 


The  average  amplitude  response  of  the  RF  signals  captured  from  all  chips 
in  part  classes  A  and  B  are  shown  in  Fig.  m  As  might  be  intuitively  expected 
given  the  architectural  differences,  the  average  signal  response  produced  by  the  Class 
A  chips  is  noticably  different  from  the  response  produced  by  Class  B.  The  mean 
amplitude  responses  produced  by  classes  C  and  D  are  omitted  from  the  plot  because 
they  were  visually  indistinguishable  from  the  average  response  of  class  B  at  the  scale 
shown. 


3.8.2  Signal  Collection.  The  emissions  from  each  micro-controller  were 
collected  using  a  near-field  probe  connected  to  a  Lecroy  104-Xi-A  oscilloscope  as 


shown  in  Fig.  |3.4|  The  probe  acts  as  an  antenna  to  receive  the  unintentional 
emissions  from  the  device  under  test  and  does  not  directly  contact  the  chip.  All 
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Signal  Collection 


Figure  3.4  Signal  collection  and  pre-classification  processing  process. 

data  was  collected  at  a  sample  rate  of  fs  =  2.5  GSa/sec  with  a  Wj_,p  =  1  GHz  low 
pass  anti-aliasing  filter  inserted  between  the  probe  and  the  oscilloscope. 

The  high  sampling  rate  used  is  excessive  for  the  devices  under  test,  which  op¬ 
erate  at  Folk  =  29.48  MHz,  but  allows  post-collection  simulation  of  various  receiver 
configurations.  All  results  are  based  on  post-processed  signals  with  an  effective  sam¬ 
ple  rate  of  208  MSa/sec,  which  was  achieved  by  passing  all  collected  signals  through 
a  low-pass  Butterworth  filter  having  a  -3dB  bandwidth  of  WBB  =  100  MHz,  and  dec¬ 
imating  them  by  a  factor  of  12  using  proper  decimation  (i.e.,  every  twelfth  sample 
is  retained,  all  others  are  discarded). 


To  simulate  the  enrollment  process  in  Section  |3.4[  each  micro-controller  is 
repeatedly  commanded  to  perform  an  identical  sequence  of  operations  on  known 
(constant)  data.  At  the  start  of  the  operation  sequence,  the  micro-controller  asserts 
a  trigger  signal  to  begin  the  data  acquisition  process.  In  practice,  devices  intended 
for  authentication  could  easily  be  configured  to  generate  the  required  trigger.  Use 
of  this  technique  for  non-cooperative  recognition  tasks  would  require  either  precise 
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detection  of  the  event  to  be  fingerprinted  or  post-processing  to  extract  the  relevant 
portion  of  the  captured  signal. 

For  this  study,  environmental  effects  (e.g.,  temperature  and  supply  voltage) 
were  controlled  by  warming-up  each  device  for  approximately  20  minutes  to  stabilize 
the  operating  temperature  and  using  a  bench  power  supply  to  provide  a  stable  supply 
voltage.  After  warm-up,  Npp  =  500  signals  were  collected  from  each  chip  as  the 
challenge-response  sequence  was  repeatedly  executed.  For  practical  implementations, 
training  over  the  range  of  expected  operating  temperatures  and  supply  voltages  has 
been  shown  to  be  an  effective  technique  to  deal  with  environmental  fluctuations 
TSU04  . 

For  all  collections,  acquisition  order  for  the  chips  was  randomly  generated 
to  prevent  any  collection-order  dependent  variance.  No  measures  were  taken  to  iso¬ 
late  the  data  collection  system  from  background  environmental  noise — all  collections 
were  performed  in  an  office  building  environment  with  numerous  co-located  PCs  and 
wireless  devices. 


3.8.3  K-Fold  Cross  Validation.  Classification  performance  is  evaluated 
using  K-fold  cross  validation.  Consistent  with  common  practice  (TK09],  K  =  10 
is  used  such  that  each  collection  of  NFP  =  500  statistical  fingerprints  (one  for  each 
sequence  of  operations)  per  device  is  divided  into  ten  blocks  each  having  Nk  = 
50  fingerprints  per  block.  Nine  blocks  from  each  device  are  used  for  training  and 
one  block  is  “held  out”  for  classification.  The  training  and  classification  process  is 
repeated  ten  times  until  each  of  the  ten  blocks  has  been  “held  out”  and  classified. 
Thus,  each  block  of  statistical  fingerprints  is  used  once  for  classification  and  nine 
times  for  training.  Final  cross-validation  performance  statistics  are  calculated  by 
averaging  the  results  of  all  K  =  10  folds  and  calculating  95%  confidence  intervals  to 
determine  statistical  significance. 
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3.8.4  Noise  Simulation.  Signal  collection  occurred  in  a  controlled  office 
environment  while  limiting  outside  influences  where  possible.  To  assess  performance 
under  less  ideal  conditions  (noisier  integrated  systems,  poor  probe  positioning,  in¬ 
creased  separation  between  sensor  and  system  under  test,  etc.),  captured  signals 
are  power-scaled  for  analysis.  To  evaluate  performance  under  lower  SNR  conditions 
the  collected  signals  were  combined  with  like-filtered  additive  white  Gaussian  noise 
(AWGN)  to  achieve  desired  analysis  SNRs  of  —  50  <  SNR  <  50  dB  in  1  dB  incre¬ 
ments  as  shown  in  Fig.  |3.4|  During  pre-classification  processing,  the  baseband  signal 
and  AWGN  were  digitally  filtered  using  the  same  filter,  i.e.,  a  low-pass  Butterworth 
filter  with  a  -3  dB  bandwidth  of  Wbb  =  100  MHz. 

For  each  Monte  Carlo  iteration,  a  total  of  Nz  independent  AWGN  realiza¬ 
tions  are  generated,  filtered,  scaled  and  added  to  the  collected  signals  prior  to  fin¬ 
gerprint  generation.  After  Nz  Monte  Carlo  iterations  at  each  SNR,  K  =  10  fold 
cross-validation  results  are  averaged  to  calculate  summary  classification  performance 
statistics.  All  results  are  based  on  Npp  =  500  empirically  collected  signals  per  de¬ 
vice  and  Nz  =  100  simulated  AWGN  realizations  at  each  analysis  SNR  for  a  total 
of  x  Nz  x  Npp  =  2  000  000)  classification  decisions. 

3. 9  Results 

Extensive  pilot  studies  were  performed  using  all  Np>  =  40  devices  to  assess 
1)  the  influence  of  various  system  parameters  on  overall  performance,  2)  the  abil¬ 
ity  of  the  classification  approach  to  deal  with  the  large  number  of  devices,  and  3) 
the  sensitivity  of  fingerprint  performance  to  various  arbitrarily  selected  response  se¬ 
quences.  Performance  was  evaluated  by  varying  the  system  parameters  over  Ncl  = 
{2,  3, ...,  64},  Nr  =  {Ncl/ 2,  NCL,  NCL- 2},  and  WBB  =  {25,  50, 100,  250,  500}  (MHz). 
All  pilot  studies  analyzed  signals  at  the  collected  SNR  (no  noise  scaling).  The  effect 
of  including  each  of  the  instantaneous  responses  (IA,  IP,  and  IF)  was  also  assessed 
during  the  pilot  studies.  The  results  used  Nr  =  Ncl  =  32  and  WBB  =  100  MHz 
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Figure  3.5  Average  identification  performance  results  (classification  success  rate) 
for  K=10  fold  cross-validation  with  Nz  =  100  random  noise  realizations 
at  each  analysis  SNR. 


for  statistical  fingerprints  composed  of  all  three  instantaneous  responses  (IA,  IP, 
and  IF).  This  combination  of  parameters  and  signal  responses  provides  a  reasonable 
trade-off  between  system  performance  and  computation  time. 

Fig.  [33]  shows  overall  average,  best,  and  worst-case  observed  identification 
performance  for  the  Nr,  =  40  tested  devices  in  Table  |3.2|  The  95%  CIs  were  cal¬ 
culated  for  the  average  performance  data.  However,  they  are  omitted  from  the  plot 
for  visual  clarity  since  they  are  approximately  the  width  of  the  data  markers.  As 
indicated  in  Fig.  |3.5[  the  classifier  achieved  an  overall  average  identification  success 
rate  of  of  99.7%  (a  =  0.892)  at  the  collected  SNR  (no  added  noise),  and  maintained 
average  identification  success  rates  of  90%  or  better  for  simulated  SNRs  >  15  dB. 
Perfect  identification  (100%  classification  success  rate)  was  achieved  for  27  of  the  40 
tested  devices  when  analyzed  at  the  collected  SNR.  The  worst  observed  identification 
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performance  was  exhibited  by  device  A6,  which  was  successfully  identified  95.7%  of 
the  time  at  the  collected  SNR.  The  confusion  matrices  (not  presented)  show  that  the 
majority  of  identification  errors  for  device  A  6  are  associated  with  misclassifying  de¬ 
vice  A6  as  either  A8  or  A4.  As  discussed  in  Sec.  |3.7.1.1[  this  is  due  to  the  similarity 
evident  from  visual  inspection  of  the  device  statistical  fingerprints  depicted  in  Fig. 
3.3  Likewise,  the  most  unique  looking  fingerprint  in  Fig.  |3.3|  (device  .49)  yields  the 


best  identification  performance  of  all  tested  devices  across  the  range  of  simulated 
SNRs.  As  expected,  identification  performance  degrades  as  the  input  signals  are 
corrupted  with  noise,  and  approaches  the  accuracy  of  a  random  guess  (1/40  =  2.5%) 
at  the  lowest  analysis  SNRs. 


While  it  is  anticipated  that  the  achievable  SNR  for  the  envisioned  applications 
will  be  very  high  in  practice  since  emissions  are  captured  using  a  near-field  probe, 
it  is  important  to  note  that  the  identification  accuracies  are  representative  of  what 
is  achievable  by  extracting  a  single  fingerprint  from  the  device  being  identified.  Ex¬ 
tension  of  the  Bayesian  classification  approach  for  multiple  extracted  fingerprints  is 
straightforward,  and  should  provide  considerable  improvement  in  identification  ac¬ 
curacy.  Such  improvement  may  be  required  for  specific  applications  not  considered 
here. 


To  evaluate  system  verification  performance,  a  total  of  40  ROC  curves  were 
generated  at  each  analysis  SNR,  again  using  K-fold  cross-validation.  The  ROC  curves 
were  generated  by  sequentially  treating  each  tested  device  as  the  claimed  identity  and 
testing  the  verification  performance  against  the  known  true  identity  associated  with 
each  fingerprint.  Each  individual  curve  shows  the  achievable  trade-space  between 
system  security  and  accessibility  as  a  function  of  the  selected  decision  threshold  for 
a  particular  device.  Figs.  |3 . 6| and [T7| show  the  ROC  curves  for  the  worst-performing 
device  identity  (A6)  at  selected  analysis  SNRs. 

At  the  collected  SNR,  the  system  achieved  an  average  EER  across  all  tested 
identities  of  0.044%  (<r  =  0.12);  32  of  the  40  tested  identities  exhibited  EERs  of 
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<  0.01%.  The  worst  four  EERs  were  0.19,  0.33,  0.50,  and  0.53%.  As  with  the  iden¬ 
tification  task,  worst  case  verification  performance  was  associated  with  the  template 
for  device  A6,  followed  closely  by  device  .48.  Again,  this  is  due  to  the  similarity 
between  the  statistical  fingerprints  of  the  two  devices  as  shown  in  Fig.  IH  Fig- 
|3.6|  shows  the  behavior  of  device  verification  performance  as  the  input  signal  quality 
degraded,  i.e.  performance  decreases  significantly  as  the  collected  signal  is  corrupted 
with  additional  simulated  noise,  eventually  reaching  near  “random”  verification  ac¬ 
curacy  at  SNR  =  —40  dB.  Again,  if  higher  performance  is  required  for  a  specific  ap¬ 
plication,  verification  accuracy  can  be  improved  by  extracting  multiple  fingerprints 
from  the  device. 


In  this  study,  steps  were  taken  to  ensure  that  the  device  being  authenticated 
is  as  isolated  from  external  effects  as  practical.  In  operational  systems,  it  is  likely 
that  a  particular  chip  is  permanently  integrated  into  a  larger  system  component  or 
board.  For  instance,  in  embedded  systems  or  smart-cards,  the  micro-controller  is 
likely  to  be  soldered  or  otherwise  packaged  internally  to  the  authentication  device. 
Since  the  process  controls  for  integration  and  packaging  may  not  be  as  stringent  as 
those  for  fabrication  of  the  IC  itself,  it  is  believed  that  fingerprinting  of  the  integrated 
device  may  actually  prove  to  be  more  effective  than  fingerprinting  of  isolated  ICs  as 
considered  here. 


While  the  raw  data  acquired  for  this  study  was  obtained  using  a  high  speed 
digital  sampling  oscilloscope,  sample  rate  and  bandwidth  were  intentionally  limited 
using  the  procedure  described  in  Sec.  |3.8.2|to  simulate  a  less  capable  receiver  setup 
(e.g.,  a  low-cost  data  acquisition  card).  Furthermore,  preliminary  experiments  us¬ 
ing  a  simulated  tuned  receiver  setup  show  that  a  substantial  amount  of  the  overall 
discriminatory  information  is  carried  in  narrow  sidebands  around  harmonics  of  the 
device  clock  frequency.  Thus,  it  is  believed  that  more  practical,  low-cost  receiver 
architectures  are  likely  to  remain  highly  effective. 
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A  final  observation  is  that  the  results  suggest  acceptable  performance  may  be 
achieved  for  some  devices  without  the  need  for  extensive  optimization  of  the  re¬ 
sponse  sequence.  The  results  herein  were  obtained  by  arbitrarily  designating  several 
clock  cycles  of  an  overall  operation  sequence  as  the  response  region.  No  statistical 
difference  in  performance  was  observed  during  limited  trials  when  the  designated  re¬ 
sponse  region  was  varied  to  include  sub-regions  containing  very  different  microcode 
instruction  sequences.  This  suggests  that  the  technique  can  be  implemented  in  an 
opportunistic  manner,  where  the  challenge-response  sequence  can  be  conveniently 
selected  from  a  portion  of  other  authentication  procedures  or  protocol  communica¬ 
tions.  Additional  improvements  in  performance  might  be  obtained  by  more  carefully 
choosing  or  defining  a  response  that  emphasizes  device  sub-circuitry  that  exhibits 
high  inter-device  variability.  However,  it  is  believed  that  for  many  applications  the 
opportunistic  approach  will  provide  sufficient  performance  without  the  need  for  fur¬ 
ther  performance  improvement  through  response  optimization. 

Although  obtained  under  controlled  conditions  using  a  single  sensor  module, 
the  results  thus  far  are  very  promising  and  the  proposed  technique  merits  additional 
investigation.  These  results  also  suggest  that  RF-DNA  fingerprinting  is  suitable 
for  other  applications  such  as  Trojan  or  counterfeit  device  detection  and  forensic 
attribution  for  criminal  or  other  investigations. 

A  considerable  amount  of  work  remains  to  fully  understand  the  suitability 
of  RF-DNA  fingerprinting  for  practical  security  implementations.  Intuitively,  the 
nature  of  the  intrinsic  characteristics  that  induce  inter-device  variations  suggests 
a  fingerprint  based  on  those  variations  will  be  extremely  difficult  to  impersonate. 
However,  further  analysis  and  experimentation  are  needed  to  confirm  this.  Other 
areas  for  additional  study  include: 

•  Permanence  and  robustness  of  RF-DNA  features  under  varying  environmental 
conditions. 


76 


Figure  3.6 
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Worst  case  (device  A6)  observed  ROC  curves  showing  verification  per¬ 
formance  across  the  full  range  of  simulated  noise  conditions. 
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Figure  3.7 


Worst  case  (device  A6)  observed  ROC  curves  showing  verification  per¬ 
formance  at  collected  SNR  and  under  simulated  high  SNR  conditions. 
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•  Sensitivity  of  fingerprint  performance  to  variations  due  to  different  sensor  mod¬ 
ules  or  sensor  positioning. 

•  Scalability  to  very  large  databases. 

•  Optimization  of  the  device  response  sequence  in  the  challenge-response  phase 
to  maximize  device  discriminability. 

•  Suitability  for  low-cost  implementations  through  bandwidth  optimization  and 
investigation  of  alternative  sensor  module  architectures. 


Effectiveness  for  programmable  or  custom  logic  ICs  such  as  FPGAs  or  ASICs. 


Finally,  recent  research  KTM09,WTR10  has  demonstrated  significant  perfor¬ 
mance  gains  using  spectral  Fourier-based  and  wavelet-based  RF-DNA  fingerprints 
for  intentional  emissions.  Application  of  these  approaches  to  recognition  of  uninten¬ 
tional  IC  emissions  remains  an  area  of  future  research. 


3.10  Conclusion 

RF  “distinct  native  attributes”  (RF-DNA)  possessed  by  the  unintentional  emis¬ 
sions  of  ICs  are  a  rich  source  of  discriminatory  information  for  device  recognition. 
Empirical  results  demonstrate  the  suitability  of  RF-DNA  fingerprinting  for  both 
identification  and  verification  device  recognition  tasks.  For  experimentally  collected 
emissions,  the  technique  correctly  identifies  devices  greater  than  99.5%  of  the  time, 
with  average  verification  EERs  of  less  than  0.05%  using  a  single  extracted  finger¬ 
print.  Correct  identification  success  rates  of  better  than  90%  were  maintained  under 
analysis  conditions  of  SNR  >  15  dB.  Thus,  RF-DNA  fingerprinting  is  promising  for 
anti-cloning  and  related  security  applications  requiring  unique  IC  device  recognition. 
Furthermore,  the  impressive  performance  indicates  the  technique  may  be  adaptable 
to  less  ideal  conditions  while  still  providing  acceptable  performance.  Finally,  these 
results  were  obtained  using  a  single  extracted  fingerprint,  and  a  substantial  improve- 


79 


ment  in  performance  is  believed  to  be  realizable  through  a  straightforward  extension 
of  the  approach  for  multiple  extracted  fingerprints. 

3.11  Supplementary  Discussion 

This  section  contains  an  expansion  of  the  material  submitted  as  part  of  this 
article  that  could  not  be  included  due  to  mandatory  space  limitations. 

3.11.1  Experimental  Setup.  This  section  includes  additional  information 
on  the  experimental  setup  used  to  acquire  the  data  for  all  experiments  conducted 
during  this  research.  The  primary  source  of  data  used  for  this  research  (unintentional 
EM  signals)  was  collected  using  AFIT’s  commercial  Riscure  Inspector  side  channel 
analysis  system.  The  experimental  setup  closely  follows  the  setup  described  in  Sec. 
|2.7|as  shown  in  Fig.  |2.5[ 

The  system  is  configured  to  collect  unintentional  RF  emissions  from  the  device 
under  test  using  a  near- field  probe  (1  GHz  bandwidth)  connected  to  a  Lecroy  104-Xi- 
A  oscilloscope.  The  probe  acts  as  an  antenna  to  receive  the  unintentional  emissions 
from  the  device  under  test,  and  does  not  directly  contact  the  chip.  The  oscilloscope 
has  a  1  GHz  bandwidth,  a  maximum  sample  rate  of  10  GSa/sec,  and  four  channels 
with  12.5  MBits  of  sample  memory  each.  Depending  on  the  device  technology  and 
clock  rate  of  the  device  under  test,  hardware  low-pass  filters  (either  built-in  to  the 
oscilloscope,  or  in-line  filters  inserted  between  the  H-field  probe  and  the  oscilloscope 
input)  are  used  as  necessary  to  prevent  signal  aliasing  due  to  the  analog-to-digital 
sampling  process. 

The  near-field  probe  is  mounted  on  a  computer-controlled  motorized  XYZ  table 
for  consistent  placement  of  the  probe  relative  to  the  device  under  test.  Initial  probe 
position  for  measurements  is  established  by  performing  a  two-dimensional  scan  of 
the  surface  of  the  tested  chip  as  it  repeatedly  executes  the  operation  of  interest. 
The  results  of  the  scan  are  processed  with  a  digital  bandpass  filter  and  analyzed 
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to  determine  the  location  of  maximal  RF  energy  in  the  band  corresponding  to  the 
known  internal  clock  frequency.  The  probe  and  relative  device  positions  remained 
fixed  for  all  subsequent  collections.  Custom  jigs  are  fabricated  for  each  test  board  to 
help  stabilize  the  device  being  measured  and  to  ensure  consistent  placement  of  the 
probe  for  subsequent  measurements. 


Figure  3.8  Riscure  Inspector  Side  Channel  Analysis  System  (Ris09 


To  improve  collection  efficiency  and  reduce  post-processing  for  signal  align¬ 
ment,  the  training  devices  are  controlled  by  a  PC  over  an  RS-232  serial  interface. 
Devices  are  programmed  to  assert  a  trigger  signal  on  one  of  the  general  purpose 
input/output  (GPIO)  pins  at  the  start  of  the  operation  sequence.  The  oscilloscope 
is  configured  through  a  PC  interface  to  collect  the  RF  signal  for  a  fixed  time  interval 
each  time  the  trigger  is  asserted.  This  enables  precise  identification  and  alignment  of 
the  individually  collected  signals  without  the  need  for  extensive  post-processing.  As 
described  in  Chap.  [4j  triggering  based  on  known  features  that  indicate  the  start  of 
the  encryption  operation  using  real-time  EM  emissions  is  also  feasible,  but  requires 
substantial  post-processing  of  the  acquired  signals  to  achieve  similar  alignment. 
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4-  Leakage  Mapping:  A  Systematic  Methodology  for  Assessing  the 
Side  Channel  Information  Leakage  of  Cryptographic  Implementations 


This  chapter  contains  the  text  of  an  article  that  has  been  submitted  for  publication 
to  the  Association  of  Computing  Machinery  (ACM)  Transactions  on  Information 
and  System  Security  |[CBL11|.  This  article  was  co-authored  by  Dr.  Rusty  Baldwin 
and  Mr.  Eric  Laspe. 


4-1  Abstract 

We  propose  a  generalized  framework  to  quantify  the  side-channel  information 
leakage  from  arbitrary  cryptographic  implementations.  The  framework  provides  a 
comprehensive  methodology  to  assess  the  information  leakage  from  all  algorithmically- 
specified  key-dependent  intermediate  computations  for  implementations  of  symmet¬ 
ric  block  ciphers.  The  leakage  assessment  quantitatively  bounds  the  resistance  of  an 
implementation  to  the  general  class  of  differential  side  channel  analysis  techniques. 
The  leakage  mapping  framework  is  demonstrated  using  the  well-known  Hamming 
Weight  and  Hamming  Distance  leakage  models,  with  recommendations  for  exten¬ 
sion  of  the  technique  to  more  accurate  models.  The  approach  is  applied  to  two 
typical  unprotected  implementations  of  the  Advanced  Encryption  Standard,  and  the 
assessment  results  are  empirically  validated  against  a  correlation-based  differential 
power  analysis  attack. 


Ih2  Introduction 

Over  the  last  several  years,  there  has  been  an  extensive  effort  in  academia 
and  industry  to  address  physical  implementation  attacks  on  cryptographic  systems, 
to  include  both  side  channel  and  fault  attacks  |KJJ99[|MQP07l|AARR02HARRS05 


BCO04,  SLP05,  Sko06,  SMY09,  GBTP08  .  However,  relatively  little  work  has  been 


82 


done  to  aid  cryptographic  system  designers  in  practically  addressing  the  identified 
security  risks. 

From  a  security  perspective,  it  is  naive  and  dangerous  to  assess  implementa¬ 
tion  security  based  on  the  results  of  limited  testing.  For  instance,  the  failure  of  a 
differential  power  analysis  (DPA)  attack  based  on  the  Hamming  Weight  of  an  S-Box 
output  should  not  be  the  basis  for  concluding  the  implementation  is  secure  against 
DPA  or  other  side  channel  analysis  techniques  in  general. 

It  is  also  dangerous  to  assess  the  overall  strength  (or  weakness)  of  an  implemen¬ 
tation’s  security  based  on  analysis  of  a  small  subset  of  the  overall  implementation 
(e.g.,  one  S-box),  particularly  for  complex  FPGA  or  ASIC  circuits  where  various 
portions  of  the  algorithm  may  be  implemented  in  completely  separate  physical  areas 
of  a  die,  with  very  different  layouts  and  routing  in  each  area. 


From  a  cryptographic  system  designer  or  engineer’s  perspective,  designing  a 
system  that  is  secure  against  the  plethora  of  rapidly  evolving  physical  implemen¬ 
tation  attacks  is  daunting.  Nevertheless,  to  sufficiently  characterize  the  leakage  of 
an  implementation  against  the  full  spectrum  of  tools  that  may  be  employed  by  a 
real  adversary,  it  is  essential  that  a  systematic  evaluation  of  an  implementation  be 
conducted  against  the  widest  variety  of  attacks  possible.  In  particular,  the  effec¬ 
tiveness  of  countermeasures  cannot  be  adequately  understood  unless  their  effects  are 
comprehensively  assessed  against  the  full  spectrum  of  available  attack  techniques. 


The  work  by  Standaert,  et  al.  |SMY09|  is  the  first  concerted  effort  to  provide  a 
practice-oriented  framework  for  a  fair  comparative  evaluation  of  side  channel  attacks 
and  implementation  leakage.  The  work  provides  a  solid  foundation  for  the  compari¬ 
son  of  attack  effectiveness  as  well  as  a  standard  method  for  quantitatively  bounding 
the  leakage  from  a  particular  device. 


However,  to  assess  the  worst-case  leakage  of  an  implementation,  Standaert ’s 
methodology  essentially  relies  on  an  evaluator’s  skill  in  the  subtle  art  of  building  an 
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optimal  template  attack.  Furthermore,  the  results  are  only  applicable  to  the  targeted 
portion  of  the  overall  cryptographic  algorithm,  which  may  neglect  to  identify  other 
exploitable  leakages.  In  light  of  techniques  such  as  algebraic  SCA  |RSVC09,  RS09], 
assessing  the  overall  security  of  a  system  based  on  the  leakage  from  a  limited  portion 
of  an  overall  implementation  is  simply  insufficient. 

Herein,  we  expand  on  the  objectives  of  |SMY09  by  introducing  a  systematic 
approach  to  comprehensively  quantify  the  leakage  characteristics  of  a  given  imple¬ 
mentation.  We  provide  a  practical  tool  set  for  making  sound  decisions  during  the 
system  design  process  to  achieve,  with  some  certainty,  a  desired  level  of  resistance 
against  differential  side  channel  attacks. 


The  contributions  of  this  work  include  a  generalized  framework  for  leakage 
assessment  of  block  ciphers,  and  a  method  to  bound  the  resistance  of  an  imple¬ 
mentation  against  the  general  class  of  differential  SCA  techniques.  The  approach 
is  applied  to  two  typical  unprotected  implementations  of  the  Advanced  Encryp¬ 
tion  Standard  (AES),  and  the  assessment  results  are  empirically  validated  against  a 
correlation-based  DPA  attack. 


4-3  Background 

4-3.1  Side  Channel  Emissions  of  ICs.  All  physical  systems,  viewed  exter¬ 
nally,  produce  both  intended  and  unintended  outputs.  The  unintended  outputs  are 
quantifiable,  physically  observable  phenomena  produced  as  a  side-effect  of  normal 
operation.  When  an  unintended  observable  outcome  is  correlated  to  some  aspect 
of  the  internal  state  or  operation  of  the  system,  the  resulting  side  channel  is  said 
to  leak  information.  Herein,  the  term  information  leakage  generically  refers  to  any 
such  phenomena  that  exhibits  a  statistical  relationship  to  the  underlying  operations 
being  performed  or  data  being  manipulated. 

Information  leakage  can  result  from  a  variety  of  side  channel  emission  sources. 
Known  sources  include  variations  in  power  consumption,  radiated  electromagnetic 
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(EM)  energy,  computation  time,  and  even  acoustic  or  thermal  emissions.  The  various 
known  sources  of  side-channel  emissions  are  shown  in  the  block  diagram  of  a  notional 
two-party  communication  system  in  Fig.  |1.1| 


4-3.2  Information  Leakage  and  Side  Channel  Analysis.  The  objective  of 
side  channel  analysis  is  to  infer  details  about  internal  data  or  operations  based  on  the 
observation  of  a  side  channel  while  that  data  is  being  processed.  Various  mathemat¬ 


ical  techniques  have  been  introduced  KJJ99, SLP05, BCO04, GBTP08  that  permit 
an  adversary  to  exploit  relationships  between  the  data  manipulated  internally  to  a 
device  and  the  externally  observable  side  channels.  Even  if  the  underlying  data  has 
a  very  small  influence  on  the  external  observable,  it  is  often  possible  to  recover  secret 
key  material  from  a  cryptographic  system  using  these  techniques. 


Side  channel  attacks  have  profound  implications  for  the  physical  security  of 
sensitive  electronic  systems  since  cryptographic  algorithms  form  the  very  foundation 
of  virtually  all  modern  secure  systems — with  applications  ranging  from  remote  key¬ 
less  entry  systems  to  secure  payment  technologies  to  the  protection  of  various  forms 
of  intellectual  property.  Because  cryptographic  key  secrecy  plays  such  a  central  role 
in  security,  preserving  that  secrecy  and  understanding  the  practical  implications 
of  any  vulnerabilities  that  may  lead  to  the  disclosure  of  key  material  is  of  critical 
importance.  Thus,  although  side  channel  techniques  are  generic  in  the  sense  that 
they  can  be  used  to  infer  a  wide  variety  of  information  about  the  internal  activity 
of  an  integrated  circuit,  the  majority  of  the  research  in  this  area  has  been  on  key 
recovery  attacks  (i.e.,  side  channel  attacks )  of  cryptographic  systems. 

We  develop  a  framework  of  techniques  that  permit  a  cryptographic  engineer 
or  security  evaluator  to  systematically  assess,  investigate,  and  counter  the  numerous 
sources  of  information  leakage  that  can  lead  to  unintentional  disclosure  of  sensitive 
key  material. 
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Hereafter,  we  focus  on  symmetric  block  ciphers,  with  particular  attention  to 
the  Advanced  Encryption  Standard  (AES)  [NatOl  | .  Application  of  leakage  mapping 
to  other  symmetric  block  ciphers  is  straightforward,  and  the  general  approach  intro¬ 
duced  herein  can  be  adapted  to  most  other  classes  of  cryptographic  algorithms  (e.g., 
stream  ciphers,  asymmetric  algorithms,  etc.)  with  minimal  modifications. 

4-3.3  Structure  of  Symmetric  Block  Ciphers.  Symmetric  cryptographic 
algorithms  are  typically  made  up  of  repetitive  sequences  called  rounds,  where  each 
round  is  composed  of  a  sequence  of  primitive  operations.  A  cryptographic  operation 
is  denoted  C&  (m)  where  k  is  a  fixed  key  drawn  from  the  key  space  /C  and  m  is  the 
input  message  (either  plain-text  or  cipher-text  depending  on  the  cryptographic  mode 
of  operation)  drawn  from  the  message  space  M.. 

To  enable  efficient  implementation  on  a  wide  variety  of  hardware  and  software 
platforms,  cryptographic  algorithm  designers  often  constrain  the  underlying  primi¬ 
tive  operations  to  function  on  small  sub-blocks  of  data  no  larger  than  the  data  bus 
width  of  potential  target  platforms.  In  practice,  this  means  that  data  sub-block  sizes 
for  many  algorithms  are  limited  to  eight  or  fewer  bits  to  permit  implementation  on 
low-cost  micro-controllers  or  embedded  processors.  Throughout  this  work,  the  term 
data  is  used  generically  to  include  all  inputs  to  an  operation,  including  the  key. 

A  practical  implication  of  the  constrained  input  size  for  each  intermediate 
computation  is  that  the  intermediate  result  of  each  primitive  operation  step  can  be 
exhaustively  predicted  with  low  computational  effort.  The  dependence  of  an  imple¬ 
mentation’s  behavior  on  these  data  sub-blocks  is  one  of  the  key  concepts  exploited 
by  many  SCA  attacks. 

For  example,  consider  a  typical  8-bit  substitution  box  (S-Box)  such  as  the  one 
used  in  the  AES  Rijndael  algorithm.  Each  AES  S-Box  produces  an  output  that 
depends  only  on  an  8-bit  input  and  an  8-bit  portion  of  the  round  key.  A  common 
attack  scenario  involves  a  known  input,  i.e.,  a  known  plain-text  for  an  encryption 


operation.  The  output  can,  then,  be  exhaustively  predicted  for  each  of  the  28  =  256 
possible  round  sub-keys.  Thus,  although  both  the  AES  state  array  and  key  are  128 
bits  resulting  in  2128x2  ~  1.2  x  1077  possible  combination]]]  the  modular  structure 
of  the  algorithm  enables  prediction  of  all  intermediate  results  after  each  S-Box  stage 
with  only  16  x  28  =  4096  computations. 

The  bit-wise  exclusive-or  (XOR)  function  commonly  employed  in  cryptographic 
algorithms  further  exemplifies  this  point.  An  XOR  operation  effectively  yields  a  data 
sub-block  size  of  1-bit  since  each  bit  of  the  output  only  depends  on  two  input  bits. 
Thus,  to  compute  all  possible  hypothetical  states  given  knowledge  of  half  the  input  to 
an  XOR  stage  (assuming  the  key  bit  is  the  only  unknown)  requires  only  128  x  2  =  256 
total  computations. 


4-3-4  Advanced  Encryption  Standard  (AES). 


The  AES  is  a  public  and 


thoroughly  documented  symmetric  block  cipher.  The  reader  is  referred  to  DR01 


NatOl]  for  an  in  depth  discussion  of  the  design  and  mathematics  of  the  algorithm 


which  are  beyond  the  scope  of  this  work.  A  brief  overview  of  the  algorithm  and  the 
specific  notation  employed  herein  are  described  below.  The  notation  and  algorithm 
description  have  been  tailored  to  allow  a  more  natural  description  in  the  context  of 


SCA  attacks  and  the  leakage  mapping  technique  introduced  in  Sec.  4.4 


4.3.4. 1  Notation.  Throughout  this  work,  vector  representations  of 
data  are  denoted  by  an  over- arrow  and  matrix  forms  are  denoted  by  bold  type.  Ex¬ 
cept  where  explicitly  specified,  each  element  of  a  vector  or  matrix  represents  one  byte 
(8  bits)  of  data.  Various  other  representations  of  data  (e.g.,  bit-  or  word-oriented) 
are  sometimes  employed  to  more  naturally  match  the  actual  machine  representation 
used  for  an  arbitrary  implementation.  Such  representations  are  distinguished  by  an 

1AES  can  also  be  implemented  with  a  192  or  256  bit  key.  Individual  round  keys  are  128  bits  for 
all  FIPS  197  approved  variants  of  the  Rijndael  algorithm. 
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underset  (n)  where  n  is  the  number  of  bits  represented  by  each  vector  or  matrix 
element,  i.e., 

x  A  vector  of  bytes  where  a?*  £  {0, 1, ,  255}  is  the  ith  element  of  the  vector. 

x  A  vector  of  n— bit  elements  where  x%  £  (0, 1, . . . ,  2n  —  1}. 

(«)  (n) 

x  A  matrix  of  bytes  where  x,:J  £  (0, 1, ... ,  255}  is  the  element  at  the  ith  row  and 
jth  column  of  the  matrix. 

X  An  n-dimensional  matrix,  where  n  >  3.  The  notation  used  to  denote  individual 
elements  of  the  matrix  differ,  but  in  general  superscripts  (e.g.,  X*)  are  reserved 
to  denote  round  indices,  and  subscripts  (e.g.,  denote  individual  elements 
within  a  round. 


4-3. 4-2  AES  State  Representation.  All  FIPS  197  approved  variants 
of  the  AES  algorithm  operate  on  128- bit  data  blocks,  represented  algorithmically  as 


a  (4  x  4)  state  array  composed  of  16  bytes  of  data  NatOl  .  Using  the  above  notation, 
the  bit-vector  of  the  state  is  formed  from  the  individual  state  bits  is 


s  =  [b0  bi  b2 

(2) 


^127]  [1x128]  > 


(4.1) 


where  bi  £  {0, 1}  represents  the  ith  bit  of  data.  Each  byte  of  the  state  array  is  formed 
from  8  bits  of  the  128-bit  data  block  or 
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Figure  4.1  Illustration  of  mapping  between  AES  state  matrix  (indexed  by  row  and 
column)  and  corresponding  elements  of  the  equivalent  byte-vector. 

where  s*  G  (0, 1, . . . ,  255}.  The  (byte-oriented)  AES  state  vector ,  s,  is  then  formed 
from  the  16  bytes  of  AES  state  information 


S  —  [so  Sl  ...  s15]  [1x16]  >  (4-2) 

where  st  denotes  the  ith  byte,  and  i  G  {0, 1, ... ,  15}.  The  equivalent  AES  state 
matriJ^  is  formed  from  the  bytes  of  the  state  vector  in  column  order.  The  linear 
indices  of  the  state  vector ,  s,  can  be  translated  to  or  from  equivalent  row  and  column 
indices  of  the  state  matrix  s,  using  the  mapping  in  Fig.  |4.1| 

The  AES  master  key  (in  byte-vector  form)  is  denoted  A.  and  is  formed  from 
the  Nb  bytes  of  the  key 


A  =  [k0  ki 


fcjVs-l] 


[lxJVB] 


(4.3) 


where  ki  is  the  ith  byte  for  i  G  {0, 1, ... ,  NB  —  1},  and  NB  is  determined  from  the 
key  size  as  shown  in  Table  |4.1[  As  with  the  AES  state  information,  the  equivalent 

bit- vector  representation  is  denoted  A. 

(2) 

The  master  key  is  expanded  into  a  full  key  schedule  composed  of  (Nr  +  1) 
individual  128-bit  (16-byte)  round  keys ,  where  Nr  =  10, 12,  or  14  for  implementations 


Throughout  this  work,  the  term  state  matrix  is  used  interchangeably  with  the  state  array 


defined  in 


NatOl 


89 


Table  4.1  Number  of  intermediate  result  states  computed  during  AES  encryption 


and  key  scheduling  (inclusive  of  input  and  output  states). 


Key  Size 

Nr 

Nb 

#  Rnd  Keys 

#  Intermediate  Steps 

Cipher  Key  Sched. 

128 

10 

16 

11 

41 

70 

192 

12 

24 

13 

49 

70 

256 

14 

32 

15 

57 

79 

using  128,  192,  or  256-bit  keys,  respectively  (see  Tab.  4.1).  The  ith  round-key  (in 
byte- vector  form)  is  denoted  k\  where  i  E  {0, 1, . . . ,  ATr}. 

Each  of  the  (Nr  +  1)  individual  16- byte  round  keys,  K*,  for  round  i  (in  byte- 
matrix  form)  is  formed  from  the  individual  round  key  bytes 


K* 


l.i  Ui  h.i  Ui 

"'0  'M  ft'8  ft,12 


l.i  Ui  Ui  Ui 

'h  ^5  ^13 


TaZ  EaZ  EiZ  T»Z 

^2  ^6  'Ho  ^14 


ba'i  baZ  EaZ  baZ 

^3  ^7  'Hi  ^15 


(4.4) 


-*  4x4 


for  i  E  {0, 1, ... ,  Ay}.  The  full  key  schedule  is  constructed  as  a  three-dimensional 
matrix  denoted  K[4X4x(jv,.+i)],  where  the  individual  matrix  elements  are 


K(  c  The  element  (byte)  of  the  full  AES  key  schedule  at  row  r,  column  c  of  the  ith 
round  key,  where  *  €  (0, 1, ,  Ay}. 

4-3. 4-3  AES  Round  Structure.  Each  round  of  the  AES  is  composed 
of  a  sequence  of  four  primitive  operations  that  manipulate  the  byte-oriented  state 
matrix,  s.  The  AES  primitive  operations  are 


•  AddRoundKey  (ARK) :  Combines  the  input  data  with  the  round  key  using  a  bit¬ 
wise  XOR  operation. 


90 


•  SubBytes  (SB):  Independently  processes  each  byte  of  the  state  using  a  non¬ 
linear  substitution  (S-box). 

•  ShiftRows  (SR):  Cyclically  shifts  (byte-oriented  rotation)  the  rows  of  s  by 
incremental  offsets. 

•  MixColumns  (MC):  Independently  mixes  the  data  within  each  column  (4  bytes) 
of  s. 


A  detailed  description  of  each  AES  primitive  operation  can  be  found  in  the 
AES  specification  |Nat01  . 


One  full  round  of  an  AES  encryption  as  defined  herein  is  illustrated  in  Fig. 


4.3  This  definition  is  slightly  different  from  that  in  NatOl  .  The  two  descriptions 


are  semantically  equivalent,  as  depicted  in  Fig.  |4.2[  but  the  round  definition  herein 
allows  a  more  natural  mathematical  description  of  side  channel  attacks  and  clari¬ 
fies  the  dependency  between  each  individual  round  key  and  the  locally  dependent 
intermediate  results. 


The  first  (Nr  —  1)  rounds  of  an  AES  encryption  are  identical  and  composed 
of  a  sequential  application  of  the  ARK,  SB,  SR,  and  MC  primitive  operations.  The 
final  round  omits  MC,  instead  applying  an  additional  ARK  operation  to  produce  the 
cipher-text  output,  c. 

The  AES  state  after  completing  the  jth  intermediate  computation  step  of  round 
i  is  denoted  s!  J ,  and  the  individual  elements  of  the  AES  state  matrix  after  each  step 
are  indexed  by  their  matrix  indices,  i.e. 


s*J  AES  state  matrix  after  completing  jth  intermediate  computation  step  of  round  i, 

s*’{  The  element  (byte)  of  the  AES  state  matrix  at  row  r  and  column  c  after 
completing  jth  intermediate  computation  of  round  i. 
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(a)  (b) 

Figure  4.2  Comparison  of  (a)  FIPS  197  round  definition  to  (b)  the  round  defini¬ 
tion  used  herein.  The  two  definitions  are  semantically  equivalent,  but 
the  modified  description  in  (b)  allows  a  more  natural  mathematical 
description  of  intermediate  result  dependencies  on  round  keys. 

Thus,  s0,0  and  sJVr_1,4  represent  the  plain-text,  m,  and  cipher-text,  c,  of  an 
AES  encryption  operation,  respectively.  Since  the  output  of  each  round  is  the  input 
to  the  next,  s*’4  =  s'+1'°  for  any  two  adjacent  rounds  i  and  (i  +  1). 

4-3.5  Modeling  Leakage.  The  information  leakage  from  a  cryptographic 
implementation  can  be  modeled  at  various  levels  of  abstraction  depending  on  how 
much  is  known  about  the  implementation’s  internal  design  architecture  (to  include 
both  hardware  and  software  aspects). 

Herein,  a  leakage  model  is  defined  to  include  a  minimum  of  two  components 

•  A  leakage  function  J-l(-)  used  to  transform  one  or  more  intermediate  results 

into  hypothetical  leakage  values,  and 
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Figure  4.3 
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i  G  {0  . . .  Nr_ i } ,  based  on  the  round  definition  in  Fig.  4.2  3. 
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•  A  parameter,  Np,  that  defines  the  assumed  data  sub-block  size  used  for  model 
predictions. 

The  leakage  function,  can  operate  on  a  single  static  state  value  or  on  two 
different  states.  The  latter  definition  is  used  to  create  models  based  on  the  transistor 
switching  activity  or  other  underlying  physical  phenomena  influenced  by  the  change 
from  one  state  to  another. 


The  term  leakage  model,  when  considered  in  combination  with  the  particular 
mathematical  technique  used  in  a  differential  side-channel  analysis  (DSCA)  attack, 
is  closely  related  to  the  concepts  of  side  channel  distinguisher  and  selection  function 
employed  in  other  works  (e.g.,  |SGV08|).  Herein,  the  leakage  model  is  considered 
separately  from  the  mathematical  tools  used  to  analyze  the  relationship  between  the 
modeled  data  and  actual  leakage  data. 


4.3.5. 1  Algorithmic  Leakage  Models.  In  general,  cryptographic  sys¬ 
tem  designers  have  a  great  deal  of  flexibility  in  how  they  implement  an  algorithmic 
specification.  It  is  common,  for  example,  for  designs  to  combine  primitive  operations 
or  other  steps  to  efficiently  use  the  resources  available  on  the  target  platform,  e.g. 
the  T-Table  implementation  commonly  used  in  32-bit  software  implementations  of 
the  AES  [DROU- 

Theoretically,  given  sufficient  resources,  a  symmetric  algorithm  with  a  128-bit 
key  and  a  128-bit  data  block  size  could  be  implemented  as  a  one-step  lookup  table 
with  2128x2  ~  1.16  x  1077  entries.  In  practice,  this  is  clearly  infeasible,  and  designs 
can  be  expected  to  substantially  follow  the  algorithmic  specification — at  a  minimum 
producing  the  same  outputs  as  the  reference  implementation  after  each  round. 

Because  implementations  of  cryptographic  algorithms  generally  follow  the  algo¬ 
rithmic  specification  to  some  degree,  algorithmic  leakage  models  can  be  constructed 
even  if  no  details  of  the  underlying  implementation  architecture  are  known. 
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Because  generality  of  the  systematic  approach  was  an  important  objective  of 
this  research,  such  models  are  the  basis  of  our  work  as  they  can  be  applied  to  any 
implementation  with  minimal  modification.  Supplementing  these  techniques  with 
additional  implementation  knowledge  is  straightforward,  and  would  lead  to  a  more 
conservative  leakage  assessment  relative  to  the  worst-case  scenario  where  an  adver¬ 
sary  is  able  to  obtain  substantial  information  about  the  underlying  implementation. 


4-  3. 5. 2  Common  Leakage  Models.  A  variety  of  leakage  models  have 


been  suggested  KJJ99,MOP07,BCO04  with  varying  degrees  of  complexity.  In  prac¬ 
tice,  even  very  simple  algorithmic  leakage  models  have  been  shown  to  be  adequate 
to  carry  out  a  wide  variety  of  side  channel  attacks.  Such  simple  models  also  pro¬ 
vide  a  solid  basis  for  a  systematic  leakage  assessment  of  most  implementations  while 
maintaining  the  generality  of  the  technique.  Two  common  leakage  transformations 
proposed  in  previous  work,  Hamming  Weight  and  Hamming  Distance  of  intermediate 
results,  are  described  below. 


The  Hamming  Weight  leakage  transformation  assumes  a  relationship  between 
the  number  of  non-zero  bits  in  an  intermediate  result  of  interest  and  the  result¬ 
ing  data-dependent  variance  leaked  through  a  side  channel.  The  Hamming  Weight 
leakage  transformation,  'HW(-),  for  an  arbitrary  bit- vector,  x ,  with  n-elements  is 


where 


(4.5) 


1)  £  mVi-  i  <  n. 


For  data  representations  other  than  bit- vectors,  it  is  assumed  that  each  group¬ 
ing  of  Nd  bits  is  computed  by  converting  each  individual  element  to  an  equivalent 
bit- vector  representation  before  computing  the  Hamming  Weight.  In  the  interest 
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of  conciseness,  we  omit  the  details  of  these  conversions.  Herein,  the  HW  leakage 
transformation  operates  on  vectors  or  matrices  as  well  as  individual  elements,  i.e. 


s'  =  HW  (s) 


denotes  the  Hamming  Weight  leakage  transformation  of  the  state  matrix  s,  where 
each  matrix  element’s  Hamming  Weight  is  computed  independently.  The  Hamming 
Weight  transformation  results  in  an  output  matrix  with  the  same  dimensionality  as 
the  input. 


Hamming  Distance  is  related  to  the  Hamming  Weight,  but  measures  the  num¬ 
ber  of  bits  that  differ  between  two  bit-vectors  of  the  same  length.  The  Hamming 

Distance,  denoted  'HT>( •,  •),  between  two  n-element  bit-vectors,  x  and  y  ,  is 

(2)  (2) 


nv 


n—  1 

E 

i= 0 


Xi  ©  Vi 
.(2)  (2) 


where  ©  denotes  the  bit-wise  exclusive-or  operator,  and 


(4.6) 


0  <HV(-,-)<n.  (4.7) 

In  practice  both  the  Hamming  Weight  and  Hamming  Distance  models  effec¬ 
tively  account  for  a  significant  portion  of  the  variance  in  side  channel  emissions  during 
certain  operations.  For  example,  the  Hamming  Weight  model  is  normally  highly  ac¬ 
curate  for  operations  involving  a  pre-charged  data  bus  such  as  those  commonly  used 
on  micro-controllers. 

Likewise,  the  Hamming  Distance  model  has  been  found  quite  accurate  for 
operations  that  toggle  the  state  of  the  bits  stored  in  a  hardware  register  or  latch. 
An  appropriately  constructed  Hamming  Distance  model  can  effectively  account  for 
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virtually  all  variance  in  the  side  channel  emissions  of  even  an  iterative  FPGA-based 


hardware  implementation  of  the  AES  SOP04,SMPQ06  . 


4-  3. 5. 3  Improved  Leakage  Models.  In  addition  to  the  simple  models 
described  above,  more  complex  (and  accurate)  models  can  also  be  constructed.  For 
example,  a  straightforward  improvement  to  the  Hamming  Distance  model,  known 
as  a  switching  distance  model,  distinguishes  between  0  — »  1  and  1  — »  0  bit  transi¬ 
tions  PSQ07  by  assigning  different  weights  to  each  transition.  Application  of  the 


switching  distance  model  within  the  leakage  mapping  framework  is  straightforward. 

If  details  of  the  hardware  architecture  (i.e.,  the  net-list)  are  available,  the 
switching  activity  of  the  device  can  be  simulated  at  the  transistor  level.  Although 
such  models  are  highly  accurate,  their  employment  requires  access  to  detailed  design 
information  that  is  difficult  to  obtain  for  commercial  hardware  devices.  Furthermore, 
the  time  required  to  simulate  the  full  circuit  over  a  large  number  of  random  inputs 
makes  such  models  inappropriate  for  our  use. 

A  more  promising  technique  for  integration  into  a  systematic  leakage  assess¬ 


ment  methodology  is  linear  regression  based  models  SLP05  .  Although  not  con 


sidered  in  this  initial  study,  such  models  can  be  constructed  relatively  efficiently. 
Furthermore,  whereas  the  Hamming  Weight  and  Distance  models  assume  a  fixed 
leakage  transformation  across  all  sampled  instants  in  time,  a  regression-based  ap¬ 
proach  could  adapt  the  model  for  each  sample.  Models  constructed  using  regression 
techniques  might  be  considered  leakage  agnostic  since  they  can  adaptively  capture 
the  statistical  relationship  between  the  variance  of  the  observed  signal  at  each  sample 
and  the  data  of  interest.  It  is  believed  that  such  an  approach  would  be  substantially 
more  robust  at  the  expense  of  significantly  increased  computational  effort. 


4-3.6  Measures  of  Information  Leakage.  Various  approaches  are  used 
to  quantify  information  leakage.  The  most  frequent  include  Kocher’s  difference 
of  means  |KJJ99|,  Pearson’s  product- moment  correlation  |BCO04|,  and  Shannon’s 
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entropy  and  mutual  information  GBTP08,  SMY09,  BGP+11  .  Correlation  is  used 
herein  based  on  its  computational  efficiency  and  effectiveness  in  practical  attacks  as 
demonstrated  through  numerous  empirical  studies  MOP07,SGV08|.  Other  measures 
such  as  the  entropy  or  mutual  information-based  approach  could  also  be  used  given 
sufficient  computational  resources,  and  may  prove  to  be  more  suitable  for  implemen¬ 
tations  that  exhibit  leakages  that  do  not  conform  to  the  Gaussian  assumptions  that 
underly  the  correlation  approach. 


Pearson’s  correlation  coefficient  measures  the  strength  of  a  positive  or  nega¬ 
tive  linear  relationship  between  two  random  variables.  Although  independent  ran¬ 
dom  variables  will  have  a  correlation  of  p  =  0,  the  converse  does  not  guarantee 
independence  since  the  measure  is  only  sensitive  to  linear  relationships. 


The  population  correlation  coefficient,  Pxy,  between  two  real- valued  random 
variables  X  and  Y  is 


Cov  (A,  Y) 

^  y/Varpf)  •  Var(Y) 


(4.8) 


where  —  1  <  p  <  1  |RN88|.  The  sample  correlation  coefficient ,  rxY,  for  Nt  observa¬ 
tions  of  the  random  variables,  X  and  Y,  is 


rXY  =  Corr(X,  Y) 


Nt 


^(X,-X)(Y.-Y) 


2=1 


\ 


Nt  Nt 

5>,-x)s.5>,-y)s 

2=1  2=1 


(4.9) 


where  X  and  Y  are  the  observed  sample  means  over  the  Nt  traces.  This  formula¬ 
tion  has  the  practical  advantage  of  allowing  the  correlation  coefficient  to  be  updated 
incrementally  by  adding  new  observations  (rows)  to  each  matrix.  The  sample  cor¬ 
relation  coefficient  will  approach  its  true  value  given  a  sufficiently  large  number  of 
sample  observations. 
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4-4  Leakage  Mapping 

Our  leakage  mapping  procedure  is  a  systematic  approach  which  characterizes 
the  information  leakage  from  an  implementation  throughout  an  entire  cryptographic 
operation,  including  intermediate  results  computed  during  1)  all  rounds  of  the  cipher, 
and  2)  related  activities  such  as  key-scheduling.  This  is  done  by  carrying  out  a  known 
key  SCA  analysis  procedure  for  all  algorithmically  specified  intermediate  results, 
inclusive  of  input  and  output,  under  the  assumption  of  various  selected  leakage 
models. 

As  described  above,  Pearson’s  correlation  coefficient  quantifies  the  magnitude 
of  the  identified  leakages.  The  overall  procedure  is  composed  of  the  following  steps 

1.  Acquire  a  sufficiently  large  number  of  side  channel  leakage  traces  while  ran¬ 
domizing  key  and  input  data, 

2.  Pre-compute  all  algorithmically  specified  intermediate  results, 

3.  Model  the  data-dependent  leakage  variations  based  on  the  known  intermediate 
results, 

4.  Use  a  known-key  correlation  procedure  to  identify  and  quantify  all  potentially 
exploitable  leakages,  and 

5.  Interpret  the  results  to  bound  the  number  of  traces  required  for  key  inference 
based  on  maximum  observed  correlations,  and  prepare  summary  statistics  and 
visual  representations  of  the  data  to  aid  in  qualitative  leakage  assessment. 

Each  step  of  the  leakage  mapping  procedure  is  described  in  detail  below. 

Step  1.  Acquire  profiling  side  channel  data.  The  first  step  of  the  leakage  map¬ 
ping  procedure  is  to  collect  a  suitable  number  of  side  channel  traces  while  the  eval¬ 
uated  device  performs  cipher  operations  on  known  (random)  input/key  pairs.  The 
number  of  traces,  Nt,  required  to  sufficiently  characterize  side  channel  leakage  is 
highly  implementation  dependent. 


99 


Ideally,  the  leakage  from  a  device  would  be  measured  under  replications  of 
all  possible  input/key  combinations  to  adequately  characterize  the  noise  induced  by 
non-cipher  related  sources  (e.g.,  ambient  or  environmental  noise,  unrelated  circuit 
activity,  nearby  off-die  electronic  components,  etc).  Since  this  is  clearly  impractical 
(it  would  be  quicker  to  brute  force  the  key),  a  more  reasonable  approach  is  to  choose 
the  number  of  profiling  traces,  Nt,  based  on  the  desired  SCA-resistance  of  the  imple¬ 
mentation.  For  example,  the  design  objective  might  be  to  achieve  DPA-resistance 
for  up  to  Nt  <  1E6  traces. 

The  side-channel  signal  observed  during  each  of  the  Nt  encryption  operations 
is  digitally  sampled  at  a  fixed  rate,  typically  using  a  high-speed  digital  sampling 
oscilloscope  (DSO)  or  similar  data  acquisition  device.  For  each  measured  trace,  Ns 
samples  of  side  channel  data  are  acquired.  The  vector  L  composed  of  the  digital 
samples  from  a  single  observed  cryptographic  operation  is  referred  to  as  a  trace.  The 
vector  containing  the  samples  from  the  ith  trace  is  then 


[L0  L\ 


JNS\  lx  Ns 


(4.10) 


For  analysis,  a  measurement  matrix,  L,  composed  of  traces  corresponding  to 
the  Nt  individual  (rn,  k )  pairs  is  formed  as 


4 
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(4.11) 


where  Ljj  corresponds  to  the  jth  sample  of  trace  i.  Thus,  each  row  of  the  measure¬ 
ment  matrix  corresponds  to  a  trace  taken  for  the  ith  {rn,  k )  pair,  and  each  column 
corresponds  to  the  sample  taken  at  a  particular  instant  in  time  relative  to  the  start 
of  each  encryption  operation,  across  all  observed  traces. 
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For  each  trace,  the  corresponding  input  and  master  key  are  also  stored  in 


byte-matrices  for  later  use,  i.e., 


and, 


; 

ra0 

m0fi 

m  o,i6 

M  = 
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mi,o 

hri,i6 

friNt-i 
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(4.12) 
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(4.13) 


The  result  of  the  acquisition  phase  is  composed  of  three  sets  of  data: 


1.  The  (Nt  x  Ns )  measurement  matrix ,  L,  containing  the  digitally  sampled  side 
channel  data, 

2.  The  (Nt  x  16)  input  matrix,  M,  containing  the  plain-text  associated  with  each 
captured  leakage  trace,  and 

3.  The  (Nt  x  NB)  master  key  matrix,  &.  containing  the  master  key  associated 
with  each  captured  leakage  trace. 

Step  2.  Intermediate  result  computation.  To  characterize  the  security  of  a 
particular  cryptographic  implementation  against  the  range  of  possible  SCA  attacks, 
it  is  necessary  quantify  information  leakage  for  the  duration  of  the  full  cryptographic 
operation.  Most  published  SCA  attacks  target  the  outer  rounds  of  the  cipher  since 
the  intermediate  results  of  those  rounds  can  be  exhaustively  predicted  if  the  input  or 
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function  AESCOMPUTElNTERMEDIATES(m,K) 
s0,0  <—  m 

for  i  0, Nr  —  2  do 


s*’1 

gZ,  3 
^z+1,0 


end  for 

* 

sJVr-l,2  ^ 
cNr- 1,3  , 


AddRoundKey  (s*’°,IK?) 
SubBytes  (s*’1) 

ShiftRows  (s*’2) 

-  MixColumns  (s*’3) 

AddRoundKey  (sNr~lfi ,KNr~1) 
SubBytes  (s^-1’1) 

ShiftRows  (s^-1’2) 

'  N^,KNr) 


S"'r,u  AddRoundKey  (s 


Nr,  0 

return  s 
13:  end  function 

Figure  4.4  Pseudo  code  to  calculate  and  store  all  algorithmically  specified  inter¬ 
mediate  results  computed  during  the  full  AES  encryption  operation  for 
a  given  input,  m,  and  key  schedule,  K. 

output  is  known.  Standaert,  et  al.  introduced  the  notion  of  predictability  to  describe 


intermediate  results  that  meet  this  requirement  SOP04 


However,  more  recent  techniques  such  as  algebraic  side  channel  analysis  and 


chosen  or  adaptive  input  techniques  RS09,  RSVC09,  LPdHIO  have  demonstrated 


the  ability  to  directly  target  multiple  leaked  intermediate  results  from  the  middle 
rounds.  Therefore,  limiting  an  evaluation  to  better-known  outer-round  attacks  ne¬ 
glects  important  exploitable  leakages.  This  is  particularly  true  for  implementations 
that  implement  protective  countermeasures  based  on  the  assumption  that  only  outer 
rounds  will  be  attacked,  leaving  the  inner-rounds  unprotected. 

The  AESComputelntermediates  algorithm  in  Fig.  |4.4|  executes  an  AES  encryp¬ 


tion  algorithm  as  described  in  NatOl  while  preserving  the  state  after  each  AES 


primitive  operation.  The  retained  intermediate  state  information  is  combined  to 
construct  a  five-dimensional  intermediate  cipher  state  matrix ,  §[Arfx4x4x4x(4-ivr)+i] 


(Fig.  4.5).  The  individual  elements  of  the  full  matrix  for  the  tth  trace  are  denoted 


s;k- 
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Figure  4.5  Construction  of  the  intermediate  cipher  state  matrix ,  §,  for  a  single 
AES  encryption  operation.  For  an  AES  encryption  operation  there  are 
Ns  —  (4  •  Nr)  +  1  separate  states  computed,  inclusive  of  the  plain-text 
and  cipher-text. 


The  same  basic  process  is  repeated  to  construct  the  full  intermediate  key  sched¬ 
ule  state  matrix,  T[jvtxjvfcsx4],  based  on  the  intermediate  results  computed  during  a 
straightforward  implementation  of  the  AES  key  expansion  algorithm.  Since  a  dif¬ 
ferent  (random)  key  is  associated  with  each  trace,  one  T  matrix  is  computed  and 
saved  for  each  acquired  profiling  trace.  Individual  matrix  elements  are  denoted 
where  t  is  the  trace  index,  i  is  the  loop  counter  as  defined  in  the  key  expansion 


algorithm  in  NatOl  ,  and  j  is  an  index  corresponding  to  the  individual  intermediate 
computational  step  within  each  loop  iteration. 


Step  3.  Model  data- dependent  leakage  variations.  To  discover  all  potentially 
exploitable  data-dependent  variability  in  side  channel  emissions,  several  different 
leakage  models  should  be  considered.  Although  some  general  observations  have  been 
made  in  previous  work  about  which  models  are  best  suited  for  specific  architectures, 
focusing  too  narrowly  on  a  particular  leakage  model  and  corresponding  attack  is  likely 
to  neglect  other  important  sources  of  information  leakage.  Rather  than  choosing  one 
specific  leakage  model  that  may  not  be  well-suited  to  the  particular  implementation 
being  considered,  a  more  systematic  approach  is  taken  to  avoid  any  inadvertent 
oversights. 
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The  results  herein  were  prepared  by  considering  leakage  models  based  on  Ham¬ 
ming  Weight  and  the  Hamming  Distance  of  the  states  separated  by  1  to  4  computa¬ 
tional  steps,  for  all  algorithmically  specified  intermediate  computation  results.  The 
Hamming  Distance  between  the  result  of  two  states  is  denoted  nVn{ •,  •)  where  n  is 
the  number  of  intermediate  computation  steps  separating  the  initial  and  final  state 
values  used  in  the  computation.  For  example,  the  hypothetical  leakage  matrix ,  H, 
modeled  as  the  Hamming  Distance  between  the  AES  plain-text  and  the  output  of 
round  0  (input  to  round  1)  is 


e  =  nv4  (s0i°,  s1’0) , 

since  the  input  to  each  round  is  separated  by  four  primitive  computational  steps. 

The  leakage  assessments  conducted  to  validate  the  approach  introduced  herein 
considers  each  of  the  following  leakage  transformations 

•  HW:  HW  (S°’°),  . . . ,  HW  (S^’°) 

•  nv1-.  nv1  (§°>°, s0-1), . . . ,  nv1  (s**-1-3, s^-0) 

•  nv2-.  nv2  (s°’°, §0’2), . . . ,  nv2  (s^-1-2, s^-0) 

•  nv3-.  nv3  (§°’°,§0-3), . . . ,  nv3  s^-0) 

•  nv4-.  nv4  (s0’0^1’0), . . . ,  nv4  (s^-1-0,^-0) 

Finally,  for  each  considered  leakage  transformation,  the  data  sub-block  size 
is  varied  over  ND  e  {1,8,16,32,64,128}  to  account  for  differing  natural  machine 
representations  that  might  be  present  on  an  arbitrary  hardware  or  software  AES 
implementation.  Although  it  is  impractical  to  directly  exploit  (predict)  the  larger 
(Nd  >  64)  data  sub-block  sizes  using  exhaustive  techniques  such  as  standard  DPA 
attacks,  the  information  gleaned  from  analysis  of  leakages  based  on  models  of  the 
the  larger  data  block  sizes  aid  in  understanding  what  aspects  of  an  implementation 
are  causing  leakages. 
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This  semi-comprehensive  approach  allows  the  technique  to  be  applied  to  vari¬ 
ous  implementations  of  the  AES  that  optimize  the  algorithm  for  a  particular  hard¬ 
ware  or  software  architecture  (e.g.,  use  of  T-Tables)  without  loss  of  generality. 

The  result  of  this  step  is  the  hypothetical  leakage  matrices  containing  the 
predicted  leakages  under  each  assumed  model,  i.e. 


H  =Tl(  S  )  (4.14) 

(Nd)  \(Nd)J 

for  'HV  models,  where  No  is  the  data  sub-block  size  and  Tl  is  the  leakage  transfor¬ 
mation  function.  The  individual  elements  of  a  byte-oriented  (Np  =  8)  hypothetical 
leakage  matrix  are  denoted  H ^  where  i  is  the  AES  round,  j  is  the  computational 
step  within  the  round,  t  is  the  trace  index,  and  (r,  c)  are  the  row  and  column  of  the 
AES  state  matrix.  Note  that  for  Hamming  Distance  models,  (i,j)  corresponds  to 
the  first  argument  in  the  T-LT>{-,  •)  leakage  transformation. 

Leakages  related  to  the  key  scheduling  procedure  are  modeled  in  a  similar 
manner.  The  details  of  the  procedure  are  omitted  here  in  the  interest  of  space. 

Step  4-  Known- Key  Correlation.  The  final  step  of  the  leakage  mapping  pro¬ 
cedure  is  to  quantify  the  leakage  from  an  implementation  under  the  assumptions  of 
each  considered  leakage  model.  The  objectives  of  this  step  are  two-fold: 


1.  To  identify  potentially  exploitable  leakages  (data-dependent  variations)  of  all 
intermediate  results,  and 

2.  To  quantify  the  magnitude  of  all  identified  leakages. 

To  identify  and  quantify  leakages,  we  use  a  known  key  correlation  procedure 
based  on  the  correlation  power  analysis  (CPA)  technique  |BCO04,  MOP07|.  The 
primary  difference  between  a  CPA  attack  and  our  technique  is  that  herein  the  ob¬ 
jective  is  not  to  infer  a  key  (which  is  known  a  priori  to  the  evaluator)  but  rather  to 
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identify  leakages  and  quantify  their  magnitudes.  Therefore,  only  the  correct  (known) 
sub- keys  need  be  considered. 

Although  the  evaluator  may  have  detailed  knowledge  of  precisely  when  each 
intermediate  computation  occurs,  this  technique  does  not  rely  on  any  such  infor¬ 
mation.  This  keeps  the  approach  as  general  as  possible  while  reducing  the  risk  of 
an  evaluator  inadvertently  overlooking  leakages  that  occur  outside  of  the  expected 
times. 

The  basic  procedure  involves  computing  pairwise  correlation  coefficients  be¬ 
tween  each  column  of  the  hypothetical  leakage  matrix,  HI,  and  each  column  of  mea¬ 
surement  matrix,  L.  The  AESKnownIVCorrelation  algorithm  in  Fig.  m  illustrates 
the  procedure  for  a  byte-oriented  leakage  model. 

A  variation  of  this  known-key  correlation  procedure  is  exhaustively  performed 
for  the  intermediate  results  of  the  AES  encryption  and  key  scheduling  algorithms 
under  each  considered  leakage  model,  adjusting  the  procedure  as  necessary  to  ac¬ 
count  for  the  varying  data  sub-block  size.  Correlation  results  are  also  computed 
directly  for  the  full  key  schedule  matrix,  K,  to  identify  any  leakages  due  to  direct 
key  manipulation  not  captured  in  modeling  of  the  specified  algorithm. 

The  output  of  this  step  is  a  correlation  matrix,  M,  for  each  considered  leakage 
model.  Each  element  of  M  represents  the  correlation  between  a)  the  hypothetical 
leakage  modeled  from  a  particular  computational  step  or  intermediate  result  of  the 
cipher  and  b)  the  observed  side-channel  signal  sampled  at  some  instant  in  time 
(relative  to  the  start  of  the  cipher),  computed  across  all  traces. 

The  overall  result  is  subjectively  (through  visual  analysis  of  the  graphical  rep¬ 
resentation)  and  statistically  analyzed  to  identify  aspects  of  the  implementation  that 
exhibit  problematic  leakages. 

It  is  assumed  that  the  start  of  each  trace  is  temporally  aligned,  i.e.,  the  first 
sample  of  each  trace  corresponds  to  the  same  instant  in  time  relative  to  the  actual 
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function  AESKnownIVCorrelation(IHI,L) 
for  r  0,  3  do 
for  c  i —  0,  3  do 

for  i^0,iVr-ldo 
for  j  ^ —  0,  3  do 

for  s^0,iVs-ldo 
Corr 


end  for 


iU  >  %+ 


9:  end  for 

10:  end  for 

11:  end  for 

12:  end  for 

return  R 
13:  end  function 


Figure  4.6  Pseudo  code  to  perform  a  known-key  correlation  between  a  byte- 
oriented  hypothetical  leakage  matrix,  H  and  the  measurement  ma¬ 
trix,  L. 


start  of  the  encryption  operation.  For  evaluations  conducted  by  a  system  designer 
this  can  easily  be  accomplished  by  designing  the  implementation  to  supply  a  trig¬ 
ger  signal  to  the  acquisition  system  at  the  appropriate  time.  For  systems  where 
precise  triggering  is  not  possible,  alternate  approaches  are  to  use  a  real-time  signal¬ 
monitoring  device  such  as  the  commercially  available  Riscure  ic  Waves  device  [Ris09 
or  temporal  alignment  through  post-acquisition  signal  processing. 


Step  5:  Interpretation  of  results.  For  a  given  p,  the  number  of  traces  needed 
to  determine  whether  the  correlation  is  statistically  significant  (different  from  zero) 
can  be  determined  using  |MOP07 


N 


3  +  8 


(  Z0--°)  \ 

Vlogw7 


2 


(4.15) 


where  Z  is  the  critical  value  for  1  —  a  statistical  confidence,  which  can  be  computed 
using  statistical  software  or  looked  up  in  most  statistical  textbooks.  Mangard,  et  al. 
suggest  using  a  =  0.0001  (99.99%  confidence)  which  gives  Z  =  3.719  to  estimate  the 
number  of  traces  required  to  mount  a  successful  DPA  attack  |MOP07|. 
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This  same  relationship  can  estimate  the  maximum  tolerable  leakage  (in  terms 
of  correlation  coefficient)  to  achieve  DSCA  resistance  for  a  specified  number  of  traces. 


Solving  (4.15)  for  p  yields 


Pmax 


exP(^)-l 
exP(^)  +  f' 


(4.16) 


If  the  comprehensive  leakage  assessment  reveals  any  correlations  that  exceed 
the  maximum  threshold,  pmaxi  then  there  is  a  high  likelihood  that  the  implementa¬ 
tion  fails  to  meet  the  desired  security  objective.  Table  |4~2] illustrates  the  relationship 
between  the  maximum  tolerable  p  for  a  given  objective  number  of  traces,  Nt,  as  a 
function  of  the  required  statistical  significance,  a.  For  example,  to  achieve  DSCA 
resistance  for  up  to  Nt  =  100,  000  traces  using  a  =  0.0001,  the  leakage  should  meet 
the  condition  of  p  <  0.0166  for  all  considered  leakage  models. 


Our  results  during  extensive  pilot  studies  suggest  that  a  =  0.0001  provides 
a  good  guideline  for  estimating  the  number  of  traces  necessary  for  a  completely 
successful  differential  CPA  attack,  i.e.,  one  that  successfully  eliminates  all  guessing 
entropy.  However,  in  many  scenarios,  a  CPA  attack  can  be  considered  successful  if 
the  key  candidates  are  sufficiently  reduced  to  make  brute  forcing  the  remaining  search 
space  computationally  feasible.  Thus,  we  suggest  using  a  more  conservative  a  =  0.1 


when  using  (4.16)  as  a  test  of  DSCA  resistance.  Returning  to  the  previous  example, 
under  this  more  conservative  guideline  an  implementation  should  not  exhibit  any 
correlations  that  exceed  0.0057  to  claim  DSCA  resistance  for  up  to  Nt  =  100,  000 
traces. 


Note  that  any  claim  of  SCA  resistance  under  this  test  is  valid  only  under 
the  evaluated  combinations  of  leakage  model  parameters.  More  accurate  leakage 
models  may  enable  an  adversary  to  successfully  extract  key  material  with  less  traces 
than  indicated  by  this  test.  Additionally,  it  is  important  to  keep  in  mind  that  this 
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Table  4.2  Maximum  tolerable  correlation  (pmax)  to  achieve  A^-trace  DSCA- 
resistance  as  a  function  of  a,  predicted  using  (4.16). 


Nt 

Conf.  Level  (a) 

0.1 

0.01 

0.001 

0.0001 

1,000 

0.0573 

0.1038 

0.1375 

0.1650 

10,000 

0.0181 

0.0329 

0.0437 

0.0526 

100,000 

0.0057 

0.0104 

0.0138 

0.0166 

1,000,000 

0.0018 

0.0033 

0.0044 

0.0053 

1,000,000,000 

0.0006 

0.0010 

0.0014 

0.0017 

10,000,000,000 

0.0002 

0.0003 

0.0004 

0.0005 

procedure  does  not  directly  evaluate  resistance  to  advanced  profiling  techniques  such 
as  template  attacks. 


4-5  Experimental  Methodology 

To  evaluate  the  effectiveness  of  our  techniques,  the  full  leakage  mapping  ap¬ 
proach  was  carried  out  against  several  different  hardware  and  software  AES  imple¬ 
mentations.  We  present  results  from  two  representative  implementations  (Imp.  A 
and  Imp.  B )  are  presented;  the  characteristics  of  the  two  studied  implementations 
are  summarized  in  Table  14.31 

4-5.1  Implementation  A.  Imp.  A  was  implemented  on  a  low-cost  16-bit 
PIC  micro-controller  unit  (MCU)  representative  of  the  type  used  extensively  in  com¬ 
mercial  security  applications  (e.g.,  smart-cards,  garage  door  openers,  remote  key- less 
entry  systems,  etc.),  and  fabricated  using  an  unspecified  180-nm  process  technology. 
For  device  control  and  measurement,  the  PIC  was  mounted  on  an  evaluation  board 
and  controlled  through  a  serial  interface.  The  development  board  was  powered  from 
a  standard  lab  DC  power  supply  to  reduce  effects  of  uncontrolled  supply  voltage 
fluctuations. 


109 


The  radiated  EM  data  from  Imp.  A  was  collected  using  a  specially  designed 
RF  near- field  probe  with  a  wide-band  preamplifier  connected  to  a  Lecroy  104-XI- 
A  high  speed  digital  sampling  oscilloscope.  The  EM  data  was  collected  from  an 
unmodified  PIC  MCU.  The  chip  was  not  specially  prepared  or  decapsulated,  and  all 
EM  measurements  were  taken  by  placing  the  probe  directly  over  the  area  of  the  chip 
that  exhibited  maximal  energy  in  a  narrow  band  around  the  known  clock  rate. 

All  data  was  collected  at  a  sample  rate  of  fsi  =  5  GSa/sec  with  a  Wlp  =  1  GHz 
low  pass  anti-aliasing  filter  inserted  between  the  probe  and  the  oscilloscope.  The  ac¬ 
quired  data  was  down-sampled  to  an  analysis  sample  rate  of  2  =  200  MSa/sec 
using  proper  decimation  (i.e.,  every  25th  sample  is  retained  and  all  others  are  dis¬ 
carded).  The  higher  sample  rate  was  chosen  to  permit  post-acquisition  simulation 
of  various  receiver  configurations  using  data  collected  under  identical  environmental 
conditions. 

For  leakage  mapping,  a  total  of  100,  000  traces  were  acquired  using  random 
key  and  plain-texts  drawn  from  the  uniformly  distributed  key  and  message  spaces. 
The  signals  were  analyzed  in  their  raw-acquired  form  without  any  averaging  or  other 
noise  reduction  post-collection  processing  steps.  For  attack  validation,  additional 
measurements  were  taken  from  the  same  device  under  fixed  key  conditions  for  = 
100  random  keys,  and  Nt  =  1000  traces  per  key. 


4-5.2  Implementation  B.  The  data  for  Imp.  B  is  the  publicly  available 


power  consumption  data  from  the  second  DPA  Contest  ParlO  .  The  DPA  Contest 
data  set  was  chosen  to  demonstrate  the  feasibility  of  exhaustive  leakage  mapping  for 
a  device  fabricated  using  the  more  modern  65  nm  process  technology  and  to  allow 
reproduction  of  the  results  presented  herein. 

The  Imp.  B  data  set  was  collected  from  a  Virtex  5  FPGA,  which  is  a  modern  re¬ 


programmable  logic  device  fabricated  using  a  65  nm  process  technology  Xilll  .  The 


Virtex  5  was  mounted  on  the  commercially  available  SASEBO-GII  board  |fISRll 
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Table  4.3  Summary  of  characteristics  for  evaluated  implementations. 

Implementation  ID 


Imp.  A 

Imp.  B 

Algorithm 

AES 

AES 

Key  Size 

128 

128 

Mode 

Encryption 

Encryption 

Type 

Software 

Hardware 

Device 

Microchip  PIC24  MCU 

Xilinx  Virtex  5  FPGA 

Clock  Rate 

29.5  MHz 

24.0  MHz 

Side  Channel 

EM  (Near-field  RF) 

Power 

Sample  Rate 

200  MSa/sec 

5  GSa/sec 

#  Profiling  Traces 

100,  000 

1,000,000 

which  is  specially  designed  to  facilitate  control  and  side  channel  measurements  of 
cryptographic  designs. 

For  leakage  mapping,  the  DPA  Contest  template  data  set  was  used.  The  tem¬ 
plate  data  set  includes  1,000,000  traces  collected  using  random  keys  and  plain¬ 
texts.  For  attack  validation,  we  used  the  DPA  Contest  public  data  set,  composed  of 
Nt  =  20,  000  traces  for  each  of  Ay  =  32  random  keys.  According  to  the  available  doc¬ 
umentation,  each  trace  in  these  data  sets  was  averaged  10  times  during  acquisition 
to  reduce  the  effect  of  external  noise  sources. 


Additional  details  on  the  experimental  setup  and  acquisition  procedures  for 


Imp.  B  are  available  on  the  DPA  Contest  ParlO  and  SASEBO  fISRIl  websites. 


4-5.3  Data  Alignment.  For  experimental  efficiency,  both  sets  of  data  were 
acquired  with  the  aid  of  a  trigger  signal  asserted  by  the  target  device  at  the  start  of 
each  encryption  operation  of  interest.  This  allows  for  near  perfect  alignment  with¬ 
out  requiring  extensive  post-collection  processing  of  the  acquired  data.  No  further 
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alignment  was  performed  on  the  Imp.  A  data  set  prior  to  processing.  The  DPA 
Contest  data  sets  used  for  Imp.  B  are  also  nearly  perfectly  aligned. 

In  practice,  a  trigger  is  not  required,  and  we  successfully  demonstrated  acquisi¬ 
tion  setups  with  real-time  triggering  using  only  the  monitored  side  channel  emissions 
of  both  device  types.  Thus,  the  lack  of  an  available  trigger  signal  does  not  prohibit 
employment  of  our  techniques,  but  does  increase  the  required  data  processing  time. 
The  data  from  each  target  implementation  was  collected  using  random  plain-text  / 
key  combinations  for  the  number  of  profiling  traces  indicated  in  Table  |4.3[ 

4.5.4  Resource  Requirements.  All  results  were  obtained  on  a  standard 
scientific  workstation  equipped  with  dual  quad-core  Xeon  processors,  a  two  terabyte 
hard-drive,  and  72  GB  of  RAM.  A  comparable  workstation  can  be  obtained  or  built 
for  approx  $5,000  U.S.  in  2011.  All  code  was  written  in  Matlab  and  optimized  for 
computation  speed  (vice  memory  efficiency). 

Execution  of  the  full  leakage  mapping  procedure  for  either  implementation, 
across  all  considered  parameter  combinations,  takes  approximately  12  hours  on  this 
workstation.  Although  the  memory  requirements  are  high,  the  same  results  should  be 
obtainable  using  a  workstation  with  significantly  less  available  memory  by  optimizing 
the  computations  to  work  primarily  from  disk. 

4-6  Results 

This  section  contains  a  representative  sample  of  the  results  obtained  by  apply¬ 
ing  the  full  leakage  mapping  procedure  to  the  two  AES  implementations  described 
in  Sec.  14.51 

The  technique  introduced  in  Sec.  |4.4|was  applied  to  selected  data  sets  to  assess 
the  overall  leakage  from  each  implementation.  The  resulting  data  is  evaluated  to 
identify  the  maximal  exhibited  leakages,  and  the  resulting  leakage  maps  were  studied 
to  determine  how  much  detail  about  the  implementations  a  naive  adversary  would 
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Table  4.4  Summary  of  correlation  results  for  all  considered  byte-oriented  leakage 
models. 


Leakage  Model 

Implementation  A 

Implementation  B 

Min 

Max 

Mean 

Min 

Max 

Mean 

nw 

0.77 

0.91 

0.84 

0.00 

0.08 

0.01 

nvl 

0.63 

0.87 

0.73 

0.01 

0.11 

0.04 

nv2 

0.01 

0.63 

0.17 

0.00 

0.07 

0.01 

nv3 

0.01 

0.02 

0.02 

0.04 

0.07 

0.05 

nvA 

0.01 

0.07 

0.03 

0.04 

0.07 

0.05 

be  able  to  obtain  from  the  application  of  typical  differential  attack  techniques  under 
common  leakage  models. 

Some  selected  results  of  these  analyses  are  presented  below.  Note  that  although 
the  results  obtained  through  analysis  of  non-byte  oriented  leakage  models  are  highly 
informative,  the  results  presented  herein  are  restricted  to  the  more  common  byte- 
oriented  model  in  the  interest  of  space. 

4-6.1  Intermediate  Cipher  Leakages.  The  information  leakage  from  each 
implementation  was  evaluated  using  the  procedure  in  Sec.  E!  for  both  cipher  and 
key  expansion  intermediate  computations  under  all  considered  leakage  models.  Table 
El  summarizes  the  minimum,  maximum,  and  mean  correlation  results  observed  for 
the  considered  byte-oriented  (No  =  8)  leakage  models. 

As  expected,  the  observed  leakage  from  Imp.  A  (software)  is  much  stronger 
than  the  leakage  observed  from  Imp.  B  (FPGA).  This  can  be  attributed  primarily 
to  the  architecture  of  the  FPGA  implementation,  which  processes  all  f6  bytes  of 
the  AES  state  in  parallel.  The  uncorrelated  parallel  activity  has  the  effect  of  acting 
as  a  noise  generator  and  reducing  the  effective  signal-to-noise  ratio  for  the  targeted 
intermediate  result(s). 
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In  contrast,  the  software  implementation  is  constrained  by  the  limited  size  of 
the  data  path,  and  all  circuit  activity  (with  the  exception  of  other  on-chip  house¬ 
keeping  or  peripheral  activity)  within  a  particular  clock  cycle  is  thus  dedicated  to  a 
single  AES  primitive  operation]/] 

4-6.2  Leakage  Maps.  Although  a  tabular  summary  of  the  leakage  assess¬ 
ment  can  capture  the  overall  results,  a  visual  presentation  of  the  results  permits  a 
much  more  natural  and  efficient  interpretation  given  the  large  number  of  considered 
parameter  variations. 

The  visual  presentation  also  aids  the  evaluator  in  drawing  intuitive  conclu¬ 
sions  about  what  the  most  problematic  leakages  are,  when  and  under  what  model 
parameters  they  occur,  as  well  as  quickly  highlighting  the  presence  of  any  unex¬ 
pected  leakages.  Because  the  visual  representation  can  be  thought  of  as  a  mapping 
of  the  observed  leakages  in  terms  of  time  and  intermediate  results,  we  refer  to  them 
as  leakage  maps. 

4-6.2. 1  Temporal  Leakage  Maps.  A  temporal  leakage  map  is  con¬ 
structed  as  a  two-dimensional  plot  where  the  a:- axis  represents  the  sample  time,  and 
the  y- axis  corresponds  to  the  intermediate  step  being  modeled.  Each  correlation  coef¬ 
ficient  is  represented  as  a  single  pixel,  where  the  pixel  intensity  represents  the  relative 
strength  of  the  identified  correlation  at  a  particular  instant  in  time,  normalized  to 
the  maximum  observed  correlation  for  the  currently  considered  leakage  model.  One 
temporal  leakage  map  is  prepared  for  each  data  sub-block  under  each  combination 
of  model  parameters,  i.e.,  (Nm  =  128/ ND )  separate  leakage  maps  are  prepared  for 
each  leakage  model,  where  No  is  the  number  of  bits  in  each  data  sub-block. 

3Because  the  PIC24  has  a  16-bit  data  path,  some  operations  are  actually  optimized  to  manipulate 
two  bytes  of  state  information  or  intermediate  results  at  a  time. 
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Because  the  leakage  characteristics  of  sequential  software  and  parallel  hardware 
implementations  are  very  different,  the  results  of  the  assessments  for  Imp.  A  and  B 
are  each  presented  slightly  differently. 

For  Imp.  A  the  leakage  due  to  each  modeled  intermediate  step  is  sparse  com¬ 
pared  to  the  dimensionality  of  the  correlation  result  matrices.  As  a  result,  it  is  diffi¬ 
cult  to  visually  discern  the  leakages  if  each  correlation  coefficient  is  simply  mapped 
to  a  pixel  intensity.  Therefore,  the  correlation  results  for  Imp.  A  are  prepared  using 
contour  plots,  which  highlights  the  sparse  leakages.  A  typical  example  of  a  temporal 
leakage  map  for  Imp.  A  is  shown  in  Fig.  |4.7[  This  figure  shows  the  magnitude  of  the 
correlation  observed  between  the  modeled  leakages  of  all  AES  intermediate  results 
under  a  byte-oriented  Hamming  Weight  leakage  model. 


Several  interesting  observations  can  be  made  about  the  leakage  from  Imp.  A 


based  on  a  visual  inspection  of  Fig.  |4.7[  First,  the  Nr  =  10  encryption  rounds 
and  four  primitive  operations  within  each  round  are  immediately  obvious.  Second, 
the  EM  side  channel  significantly  leaks  the  Hamming  Weight  of  all  algorithmically 
specified  intermediate  results  computed  throughout  the  entire  encryption  operation. 
Thus,  it  can  be  inferred  that  this  particular  implementation  closely  follows  the  AES 
specification.  Note  that  the  similarity  between  the  correlation  of  steps  2  —  3  during 
each  round  is  expected  under  a  Hamming  Weight  model  since  the  SR  operation  does 
not  actually  change  any  intermediate  results. 


For  Imp.  B,  the  observed  leakages  are  no  longer  sparse  in  comparison  to  the 
dimensionality  of  the  results  matrices,  and  a  direct  mapping  from  the  correlation 
matrix  to  the  temporal  leakage  map  is  possible.  A  typical  example  of  a  temporal 
leakage  map  for  Imp.  B  is  shown  in  Fig.  |4.8[ 

The  example  shown  is  the  result  of  the  leakage  assessment  for  a  byte-oriented 
HD4  leakage  model.  Although  direct  exploitation  of  the  BCD4  model  has  until  re¬ 
cently  been  considered  impractical  for  the  standard  DPA-class  of  attacks,  analysis 
of  the  results  from  the  BCD4  leakage  model  reveals  a  great  deal  of  information  about 
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Figure  4.7  Temporal  leakage  map  for  Imp.  A.  under  the  byte-oriented  BCW  model. 

The  intensity  represents  the  maximum  observed  magnitude  of  the  leak¬ 
age  observed  at  each  sampled  instant  across  all  16  bytes  of  the  AES 
state  matrix. 


what  an  adversary  could  discern  about  the  underlying  implementation  architecture 
through  the  application  of  similar  techniques.  Furthermore,  recent  results  indicate 
advances  in  computing  technology  are  now  making  direct  attacks  on  32-bit  key  hy¬ 


pothesis  feasible  MKP11  ,  which  makes  consideration  of  such  leakage  models  even 
more  relevant. 


The  round  structure  of  Imp.  B  is,  again,  immediately  obvious  from  a  cursory 
inspection  of  Fig.  H  However,  in  contrast  to  the  leakage  from  Imp.  A,  Imp. 
B  does  not  show  high  correlations  for  all  computed  intermediate  results  under  the 
BCD4  model.  Rather,  the  higher  relative  correlations  are  isolated  to  the  Hamming 
Distance  between  intermediate  results  following  the  ARK  steps  (i.e.,  BtT>(Shl,  S*+1,1)) 
in  adjacent  rounds.  This  strongly  implies  the  use  of  registers  at  that  point  in  the 
algorithm. 

Examination  of  the  source  code  of  Imp.  B  confirms  this,  and  provides  ex¬ 
perimental  conhrmation  that  the  BCD4  leakage  transformation  is  a  suitable  abstrac- 
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Figure  4.8  Temporal  Leakage  Map  for  Imp.  B  under  a  byte-oriented  LCD4  leakage 
model.  The  x-axis  represents  sample  time,  and  each  band  along  the 
y-axis  represents  one  of  the  37  initial  states  used  in  the  Hamming  Dis¬ 
tance  computation.  Close  examination  reveals  the  variations  between 
individual  bytes  are  also  visible  within  each  band.  The  intensity  repre¬ 
sents  the  magnitude  of  the  observed  correlation  at  each  instant  in  time, 
normalized  to  the  maximum  observed  under  this  leakage  model. 
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Figure  4.9  Typical  correlation  plot  for  Imp.  B  under  a  byte-oriented  LiT>A  leakage 
model,  illustrating  the  damped  oscillatory  nature  of  the  correlation  well 
beyond  the  originating  clock  cycle.  The  x-axis  represents  sample  time, 
and  the  y- axis  represents  the  observed  correlation  coefficient. 

tion  of  the  physical  leakages  produced  by  register  updates  for  this  implementation. 
These  basic  observations  suggest  that  while  the  leakage  of  information  under  the 
'H'D4  model  may  not  lead  directly  to  successful  key  recovery  attacks,  the  information 
leaked  could  be  used  by  an  adversary  to  craft  more  powerful  techniques  that  take 
advantage  of  the  particular  implementation  architecture. 

Another  interesting  observation  is  that  the  leakages  of  Imp.  B  appear  as  a 
damped  oscillatory  response  that  continues  well  beyond  the  clock  cycle  in  which  the 
modeled  computation  is  carried  out.  Fig.  |4.9|  shows  a  typical  plot  of  the  correlation 
between  a  modeled  leakage  and  the  power  consumption  of  the  circuit  observed  over 
the  the  duration  of  the  encryption  operation.  It  was  confirmed  through  numerous 
pilot  experiments  that  the  leakages  in  the  later  clock  cycles  do  lead  to  successful  DPA 
attacks,  and  can  be  combined  through  various  techniques  to  improve  the  effectiveness 
of  standard  DPA  approaches. 

4-  6. 2. 2  Summary  Leakage  Maps.  Whereas  a  temporal  leakage  map 
provides  a  visual  summary  of  an  implementation’s  leakage  over  time,  a  more  useful 
tool  for  summarizing  the  leakage  of  each  individual  data  sub-block  is  the  summary 
leakage  map.  The  summary  leakage  map  is  a  concise  graphical  representation  of 
the  maximum  leakages  identified  for  each  data  sub-block,  computational  step,  and 
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leakage  model.  The  graph  is  constructed  by  determining  the  maximum  observed 
(magnitude)  correlation  for  each  combination  of  parameters.  The  resulting  tabular 
data  is  then  graphically  represented  by  mapping  each  correlation  coefficient  to  a 
color  intensity,  normalized  to  the  maximum  observed  correlation  across  all  of  the 
considered  leakage  models. 

Summary  leakage  maps  for  Imp.  A  are  depicted  in  Fig.  H  The  summary 
maps  for  the  HV2,  HV3,  and  HV4  models  are  omitted  since,  as  expected,  the  soft¬ 
ware  implementation  exhibits  no  statistically  significant  leakage  under  those  models. 
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Inspection  of  Fig.  |4.10|  immediately  reveals  the  extent  of  the  problematic 
leakages  associated  with  this  implementation.  As  already  noted,  Imp.  A  significantly 
leaks  the  Hamming  Weight  of  every  intermediate  value  computed. 

Additionally,  there  is  also  strong  leakage  associated  with  HV1  ,  §*+1’°), 
which  corresponds  to  the  ARK  primitive  operation  in  each  round.  Although  this 
could  be  due  to  a  register  update,  the  observed  leakage  is  more  likely  the  result  of 
the  relationship  between  the  HV1  model  and  the  AES  round  keys.  Note  that  the 
Hamming  Distance  across  the  ARK  operation  is  determined  by  the  number  of  bits  tog¬ 
gled  during  the  XOR  operation  with  the  associated  portion  of  the  round  sub-key,  which 
in  turn  is  determined  by  the  number  of  ‘1’  bits  in  the  key.  Thus  'HX>1(§*’°,  S*+1,°)  is 
equivalent  to  'HW(IfC). 

Given  this,  it  appears  likely  that  this  implementation  leaks  the  Hamming 
Weight  of  the  round  keys.  In  fact,  our  results  indicate  that  the  key  leakage  ex¬ 
hibited  is  sufficient  to  permit  key  extraction  from  this  particular  implementation 
without  knowledge  of  the  plain-text  or  cipher-text. 

One  final  note  related  to  Imp.  A  is  that  although  it  is  not  evident  from  the  HV1 
summary  leakage  map,  significant  (exploitable)  leakage  is  also  caused  by  intermediate 
steps  other  than  ARK.  However,  the  relative  strength  of  the  ARK  has  the  effect  of 
masking  the  presence  of  these  smaller  leakages  due  to  the  normalization  of  the  color 
intensity  to  the  maximum  observed  correlation.  Note  that  any  masked  leakages  can 
be  readily  identified  by  a  direct  examination  of  the  tabular  results. 

In  general,  it  is  assumed  that  system  designers  will  address  the  strongest  leak¬ 
ages  first  since  they  are  presumably  the  easiest  to  exploit.  It  is  recommended  that 
the  full  leakage  mapping  procedure  be  repeated  after  implementing  any  SCA  coun¬ 
termeasures,  since  any  changes  could  incidentally  introduce  new  sources  of  leakage. 
Once  the  stronger  leakages  have  been  eliminated  or  managed,  any  remaining  leakages 
are  more  easily  identified  in  the  graphical  leakage  maps.  In  general,  it  is  assumed  that 
system  designers  will  address  the  strongest  leakages  first  since  they  are  presumably 
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the  easiest  to  exploit.  It  is  recommended  that  the  full  leakage  mapping  procedure 
be  repeated  after  implementing  any  SC  A  countermeasures,  since  any  changes  could 
incidentally  introduce  new  sources  of  leakage.  Once  the  stronger  leakages  have  been 
eliminated  or  managed,  any  remaining  leakages  are  more  easily  identified  in  the 
graphical  leakage  maps. 
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under  each  considered  byte-oriented  leakage  model  for  Imp.  B.  Each  row  (y-axis)  corresponds  to  one  byte 
of  the  AES  state  matrix,  and  each  column  (x-axis)  corresponds  to  the  starting  step  used  in  the  Hamming 
Distance  computation  (or  the  precise  step  used  in  the  case  of  the  Hamming  Weight  model).  The  intensity 
represents  the  maximum  magnitude  of  the  correlation  observed  for  each  state  byte  under  a  particular 
leakage  model,  normalized  to  the  maximum  correlation  observed  under  all  leakage  models. 


Comparing  the  summary  leakage  map  from  Imp.  A  (Fig.  4.10)  to  that  of 
Imp.  B  (Fig.  |4.8)  shows  how  different  the  leakage  characteristics  of  the  two  im¬ 
plementations  are.  Whereas  the  PIC  micro-controller  shows  strong  leakage  of  the 
Hamming  Weight  across  all  intermediate  computations,  the  FPGA  implementation 
only  exhibits  strong  leakage  under  the  BV1,  BV3,  and  BV3  models.  Such  leakages 
are  characteristic  of  hardware  implementations.  Here,  it  is  noteworthy  that  the  BID3 
leakage  model  only  exhibits  strong  (relative)  leakage  for  BV3(s9,1,  s10,0).  This  is  at¬ 
tributed  to  the  replacement  of  the  MC  step  in  the  final  round  with  an  additional  ARK, 
as  illustrated  in  Fig.  |4.2|  This  bears  additional  scrutiny  because  the  register  update 
at  this  location  meets  the  criteria  of  predictability  using  standard  DPA  techniques. 


It  is  believed  that  the  presence  of  strong  BV1  leakage  can  be  attributed  to 
direct  leakage  of  the  round  keys,  K,  generated  during  real-time  key  scheduling — 
similar  to  the  direct  key  leakage  exhibited  by  Imp  A.  Again,  examination  of  the 
source  code  supports  this  hypothesis  since  the  generated  round  key  is  stored  in  a 
temporary  register  for  each  round.  Additional  experimentation  would  be  necessary 
to  confirm  whether  key  expansion  is  the  root  cause  of  this  leakage,  and  whether  or 
not  the  leakage  is  vulnerable  to  direct  key  extraction. 


4-6.3  Attack  Validation.  Using  (4.16),  the  correlation  results  from  each 
implementation  are  used  to  estimate  the  number  of  traces  a  standard  correlation 
SCA  attack  would  require  to  correctly  extract  the  full  master  key.  These  predictions 
were  compared  to  the  number  of  traces  actually  required  using  standard  correlation- 
based  DSCA  attacks.  For  Imp.  A,  the  targeted  intermediate  for  the  DEMA  attack 
was  the  output  of  the  initial  (Round  0)  SB  operation  under  a  Hamming  Weight 
model,  i.e.  kW(§°'2).  For  Imp.  B,  the  target  was  the  Hamming  Distance  across  the 
last  three  states,  i.e.,  BV 3  (S9,1,  S10,0).  Both  attacks  were  carried  out  using  a  single 
sample  taken  at  the  instant  found  to  exhibit  the  largest  magnitude  leakage  under 
the  corresponding  leakage  model. 
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Table  4.5  Comparison  of  number  of  traces  required  for  successful  attack  as  pre¬ 


dicted  by  (4.15)  vs.  actual  results  of  a  correlation-based  DPA  attack  on 


Imp.  B  using  a  'HZ>3(s9,1,s10,0)  leakage  model. 


Predicted  (< 

a) 

Actual 

0.1 

0.001 

0.0001 

Indiv.  Byte 

Full  Key 

GE  <  32  bits 

Min.  758 

3,054 

6,362 

300 

9,400 

2,200 

Mean  1,469 

5,927 

12,352 

6, 8614 

17,  2724 

3,883 

Max.  2,490 

10,051 

20,950 

>  20, 000 

>  20,  000 

4,237 

Table  4_5  shows  the  results  of  the  attack  validation  for  Imp.  The  attack 
was  attempted  against  each  of  the  32  available  trace-sets  in  the  DPA  Contest  public 
database,  where  each  trace-set  corresponds  to  a  different  secret  key.  Traces  used  for 
the  attack  were  drawn  from  each  available  trace  set  in  a  randomly  determined  order. 
The  minimum  number  of  traces  required  to  extract  an  individual  key  byte  was  300. 
Out  of  the  32  x  16  =  512  byte  extractions  attempted,  the  attack  failed  to  extract  11 
of  the  key  bytes  after  the  available  trace  set  (Nt  =  20, 000)  was  exhausted.  Thus, 
the  standard  attack  fails  to  achieve  a  first-order  success  rate  of  1  |SMY09|  given  the 
limited  number  of  traces. 

Unfortunately  for  cryptographic  engineers,  the  failure  of  an  attack  to  extract 
the  correct  key  with  100%  certainty  does  not  imply  that  the  key  will  not  be  found. 
A  more  practical  measure  of  attack  success  in  scenarios  where  one  or  more  single 


plain-text-cipher-texts  pair  is  available  is  the  remaining  Guessing  Entropy  SMY09 


which  provides  a  measure  of  the  remaining  cost  of  a  brute-force  attack  given  the 
current  ranking  of  the  true  key  in  the  list  of  key  hypotheses. 


Fig.  |4.12|  shows  the  maximum,  average,  and  mean  guessing  entropy  for  the 
attacks  against  all  32  keys  as  a  function  of  the  number  of  traces  used  in  the  CPA 


4The  mean  values  shown  are  estimates  since  all  available  Nt  =  20,000  traces  were  exhausted 
before  all  key  bytes  were  successfully  extracted. 
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Figure  4.12  Guessing  entropy  of  Imp.  B  as  a  function  of  the  number  of  traces 
used  in  a  correlation-based  DPA  attack  on  'H'D3( s9,1,  s10,0). 


attack.  Clearly,  although  the  attack  occasionally  fails  on  the  basis  of  extracting  the 
full  key,  the  information  leakage  is  sufficient  to  enable  a  practical  brute  force  attack 
given  a  much  smaller  number  of  traces.  If  it  is  assumed,  very  conservatively,  that 
it  becomes  practical  to  brute  force  the  key  when  the  Guessing  Entropy  is  reduced 
to  <  32  bits,  an  average  of  approximately  4000  traces  are  required  for  an  attack  to 
succeed  against  this  implementation. 


Once  again,  it  is  extremely  important  to  note  that  the  abstract  algorithmic 
model  used  in  this  particular  attack  is  very  likely  to  be  sub-optimal,  and  more 


accurate  models  will  probably  succeed  with  fewer  traces.  Thus,  when  using  (4.16)  to 
predict  DSCA-resistance  based  on  the  leakage  model  assessment,  it  is  recommended 
to  use  a  conservative  a  =  0.1  or  0.2. 


4.6.4  Suitability  for  Protected  Implementations.  Initial  results  indicate 
our  approach  is  an  effective  tool  for  assessing  the  leakage  from  both  protected  and 
unprotected  implementations.  Several  pilot  studies,  not  presented  herein,  success¬ 
fully  used  leakage  mapping  to  assess  the  leakage  from  a  software-based  smart-card 
implementation  using  common  countermeasures.  We  note  that  use  of  the  leakage 
mapping  methodology  to  effectively  assess  masked  implementations  would  require 
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supplementing  the  simple  Hamming  Weight  and  Hamming  Distance  leakage  models 
with  more  complex  multi- variate  leakage  models,  however  the  procedure  required  to 
do  this  is  straightforward. 

In  general,  the  introduction  of  countermeasures  significantly  reduces  the  corre¬ 
lation  at  a  particular  instant  in  time  by  introducing  various  forms  of  noise  or  reducing 
the  signal  strength.  Our  systematic  approach  enables  an  evaluator  to  quickly  and 
efficiently  repeat  the  assessment  process  to  identify  how  much  more  resistant  an  im¬ 
plementation  is  after  adding  a  countermeasure.  Perhaps  more  importantly,  it  also 
allows  a  designer  to  ensure  that  the  implementation  of  one  countermeasure  does  not 
introduce  new  unexpected  sources  of  information  leakage.  The  preliminary  results 
indicate  that  the  leakage  mapping  assessment  methodology  is  well  suited  to  assess 
whether  a  particular  countermeasure  is  justified  given  the  added  cost  in  time,  space, 
and  energy. 

4-  7  Conclusion 

We  have  developed  a  systematic,  efficient  process  through  which  cryptographic 
engineers  can  assess  the  resistance  of  a  particular  implementation  against  differential 
SCA  techniques.  While  our  technique  does  not  guarantee  an  implementation  will  be 
secure  against  all  future  side  channel  attacks,  it  does  provide  an  efficient  mechanism 
through  which  system  designers  and  testers  may  gain  substantiai  insight  into  the  levei 
of  security  of  a  particular  implementation.  Additionally,  as  illustrated  throughout 
this  paper,  examination  of  the  leakage  maps  permits  insights  that  are  not  at  all 
obvious  if  testing  is  carried  out  in  a  less  thorough  manner. 

Our  techniques  can  be  adapted  to  incorporate  more  advanced  statistical  tools 
(e.g.  mutual  information  or  linear  regression  models) — in  particular  those  capable 
of  capturing  non-linear  leakages.  However,  the  inclusion  of  any  such  techniques 
must  be  weighed  in  terms  of  the  added  benefit  obtained  compared  to  the  addi¬ 
tional  computational  effort  required.  Additionaliy,  our  studies  to  date  have  been 
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focused  exclusively  on  signals  in  the  time  domain.  Various  other  work  has  suggested 
that  spectral-domain  techniques  may  be  even  more  effective,  particularly  against 
implementations  which  implement  basic  countermeasures  such  as  random  process 
interrupts. 


For  devices  where  the  template  attack  scenario  |CRR02,  ARRS05|  is  deemed 
a  legitimate  threat,  designers  must  also  consider  the  multivariate  leakage  from  the 
device  in  terms  of  profiled  attacks.  The  methodology  introduced  by  Standaert,  et 
al.  |SMY09]  provides  a  solid  foundation  on  which  to  evaluate  whether  a  sound  leakage 
model  can  be  formed  to  attack  the  device  using  profiled  attacks.  However,  an  efficient 
methodology  for  formulating  the  leakage  models  to  be  tested  in  this  context  remains 
an  open  problem. 


4-8  Supplementary  Discussion 

This  section  contains  additional  relevant  material  that  could  not  be  included 
in  the  submitted  paper  due  to  mandatory  space  limitations.  It  is  assumed  that  most 
readers  of  this  article  will  be  familiar  with  the  procedure  of  DSCA  attacks  in  general, 
and  correlation-based  DSCA  attack  in  particular.  However,  for  completeness,  the 
procedure  employed  in  this  research  is  described  below. 


4-8.1  Correlation- Based  DSCA. 


The  procedure  used  to  carry  out  the 


correlation-based  DSCA  attacks  is  based  on  the  correlation  DPA  described  in  MOP07 


The  general  approach  is  shown  in  Fig.  |4.13|  Each  step  of  the  procedure  is  described 
below: 


Step  1.  Choose  a  target  intermediate  computation  and  corresponding 
leakage  model.  The  chosen  intermediate  computation  must  depend  on  some 
small  portion  of  the  key  and  some  observable  or  controllable  input  or  out¬ 
put  data  t,  and  can  be  one  or  more  bits.  The  results  herein  are  based  on 
byte-oriented  leakage  models.  For  Imp.  A,  the  targeted  intermediate  was  the 
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0  Key  Hypotheses 


Result.  Highest  r ^  indicates 
most  probable  key  is  k}. 


Figure  4.13  Block  diagram  of  a  generalized  DSCA  attack  procedure  for  one  byte 
of  the  key,  assuming  a  'HW  leakage  model.  Note  that  steps  3-4  re¬ 
quire  minor  modifications  for  use  with  leakage  models  that  consider 
transitions  between  two  states,  e.g.,  HV3  (•,  •). 

output  of  the  SB  operation  for  Round  0  under  a  Hamming  Weight  model,  i.e. 
'HW( s0,2).  For  Imp.  B,  the  target  was  the  Hamming  Distance  across  the  last 
three  states,  i.e.,  T-LV 3  (s9,1,  s10,0).  Note  that  the  first  attack  assumes  an  observ¬ 
able  or  controllable  plaintext,  while  the  second  attack  assumes  an  observable 
ciphertext. 

Step  2.  Record  side  channel  data.  Record  Ns  samples  of  side  channel  data  (e.g., 
strength  of  the  EM  field  near  the  circuit)  during  each  of  Nt  cryptographic  op¬ 
erations  performed  using  an  unknown  fixed  key.  Construct  the  leakage  matrix , 
L,  and  known-message  matrix,  M  as  described  in  Step  1  of  Sec.  |4.4[  Note 


129 


that  in  the  attack  scenario,  the  adversary  has  no  knowledge  of  the  secret  key, 
which  is  the  objective  of  the  attack. 


Step  3.  Considering  one  byte  of  the  AES  state  at  a  time,  compute  the 
hypothetical  intermediate  state  matrices  for  the  states  considered 
under  the  targeted  leakage  model  for  all  possible  (hypothetical)  sub¬ 
keys,  ki,  where  i  G  {0, 1, ,  255}.  The  hypothetical  results  are  computed 
by  considering  each  possible  subkey  as  the  true  subkey,  and  computing  the 
intermediate  computation  results  for  each  known  byte  of  the  observed  message 
matrix,  M.  For  the  byte-oriented  attack,  there  are  28  =  256  possible  subkeys 
associated  with  each  byte  of  the  intermediate  result.  The  result  of  this  step  is 
the  (256  x  Nt  x  4  x  4)  hypothetical  intermediate  result  matrix,  V. 


Step  4.  Map  each  hypothetical  intermediate  result  to  a  predicted  side 
channel  leakage  value  under  the  selected  leakage  model.  In  this  step, 
each  element  of  the  matrix  V,  from  Step  3  is  mapped  to  an  estimated  side 
channel  leakage  using  the  leakage  model,  i.e.,  'HkV(V)  for  Imp.  A.  The  result 
of  this  step  is  the  hypothetical  leakage  matrix,  H[256xJVtx4x4]- 


Step  5.  Compute  the  correlation  between  predicted  hypothetical  values 
and  observed  side  channel  data.  In  the  final  step,  the  hypothetical  side 
channel  leakages  H  computed  in  Step  4  are  compared  to  the  actual  data  L 


captured  in  Step  2  using  (4.9),  i.e.,  R.  =  Corr  (H,  L).  Each  column  of  the 
H[(),0,r,e]  sub-matrix  corresponds  to  one  of  the  256  hypothesized  subkeys  for 
key  byte  K°c  (for  Imp.  A)  or  K)°c  (for  Imp.  B ).  The  columns  of  L  represent 
the  side  channel  leakage  from  the  device  at  a  particular  point  in  time.  At  any 
particular  point  in  time,  it  is  not  known  in  advance  whether  information  leak¬ 
age  is  present  in  the  side  channel  signal  due  to  the  targeted  intermediate  value. 
Thus,  the  adversary  must  search  over  all  sampled  instants  to  find  times  when 
information  about  the  intermediate  value  is  leaked.  The  most  probable  correct 
key  is  the  one  that  exhibits  the  maximum  correlation  coefficient  between  the 
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modelled  leakages  under  all  hypothetical  keys  and  the  observed  side  channel 
measurements  at  any  of  the  considered  instants  in  time.  Alternatively,  subkey 
candidates  can  be  ranked  in  descending  order  of  their  maximum  observed  cor¬ 
relation.  An  adversary  with  access  to  a  single  known  plaintext-ciphertext  pair 
can  then  attempt  to  bruteforce  the  true  key  by  trying  the  keys  in  order  of  like¬ 
lihood.  Note  that  an  evaluator  with  knowledge  of  the  true  key  can  determine 
the  remaining  guessing  entropy  by  computing  the  product  of  the  rank  of  each 
of  the  16  correct  subkeys  at  the  end  of  the  attack  procedure. 
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5.  Conclusion 


This  chapter  concludes  the  main  document  and  provides  an  overall  summary  of  the 
research  activities  and  key  findings,  followed  by  several  recommendations  for  future 
research. 


5.1  Research  Summary 

The  information  leakage  of  electronic  devices,  especially  those  used  in  cryp¬ 
tographic  or  other  vital  applications,  represents  a  serious  practical  threat  to  secure 
systems.  While  physical  implementation  attacks  have  evolved  rapidly  over  the  last 
decade,  relatively  little  work  has  been  done  to  allow  system  designers  to  effectively 
counter  the  identified  threats.  This  work  addresses  the  technology  gap  between  the 
identified  problems  and  potential  solutions,  and  makes  significant  contributions  to 
the  study  of  information  leakage  in  two  primary  areas  of  investigation: 

1.  RF-DNA  fingerprinting  of  integrated  circuits  for  device  authentication,  and 

2.  Leakage  mapping  to  assess  the  information  leakage  from  arbitrary  crypto¬ 
graphic  implementations. 

The  results  and  major  contributions  related  to  each  area  of  investigation  are 
described  below. 


5.1.1  RF-DNA  Fingerprinting  of  Integrated  Circuits.  Unintentional  elec¬ 
tromagnetic  (EM)  emissions  were  investigated  as  a  source  of  information  to  recognize 
or  verify  the  identity  of  a  unique  integrated  circuit  (IC).  The  technique  investi¬ 
gated,  known  as  radio  frequency  distinct  native  attribute  (RF-DNA)  fingerprinting, 
adapted  from  previous  work  (cf.  [SITMM08,  KTM09HRTM10[|RPT11>  HBK06 
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WMTM10,  WTR10|)  on  intentional  EM  emissions.  The  technique  was  successfully 


adapted  herein  to  recognize  individual  microchips  based  on  fabrication  process- 
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induced  variations  in  each  chip’s  unintentional  RF  emissions  in  a  manner  analogous 
to  biometric  human  identification. 

The  problem  of  IC  authentication  has  numerous  practical  applications,  includ¬ 
ing  1)  providing  enhanced  security  for  secure  access  mechanisms  (e.g.,  anti-cloning), 
2)  detection  of  unauthorized  modifications  to  circuit  designs  (e.g.,  hardware  Trojan 
detection),  and  3)  forensic  attribution  of  electronic  evidence  in  criminal  or  other 
cases.  It  is  believed  that  this  work  is  the  first  to  propose  and  demonstrate  the 
feasibility  of  using  the  unintentional  emissions  for  IC  recognition. 

Whereas  all  previously  known  IC  recognition  techniques  require  either  hard¬ 
ware  or  software  modifications  to  the  device  being  recognized,  the  RF-DNA  finger¬ 
printing  technique  permits  passive  authentication  based  on  analysis  of  the  uninten¬ 
tional  emissions  produced  during  pre-existing  processes  and  protocols.  Because  it 
is  passive,  it  is  suitable  for  security  applications  involving  commodity  commercial 
ICs  without  requiring  any  modifications  to  the  device  being  authenticated.  Thus, 
the  proposed  approach  is  very  promising  for  security  applications  such  as  those  re¬ 
quiring  detection  of  cloned,  copied,  or  counterfeited  devices.  Furthermore,  because 
the  technique  does  not  require  any  modifications  to  the  ICs,  the  approach  is  more 
cost-effective  and  scalable  than  other  known  techniques  for  applications  involving 
commodity  commercial  ICs. 

In  addition  to  being  the  first  application  of  RF-DNA  fingerprinting  techniques 
to  the  unintentional  emissions  of  ICs,  this  research  extends  the  previous  work  related 
to  the  intentional  emissions  of  wireless  networking  equipment,  in  two  new  ways. 
Previous  RF-DNA  work  has  predominantly  considered  device  identification  tasks. 
However,  the  primary  use  case  envisioned  for  IC  fingerprinting  is  to  counter  cloning 
and  related  threats,  which  requires  identity  verification.  A  systematic  approach  was 
developed  and  introduced  to  evaluate  the  effectiveness  of  RF-DNA  fingerprinting  in 
the  context  of  both  identification  and  verifications  tasks.  Additionally,  the  RF-DNA 
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fingerprinting  technique  was  extended  from  a  fixed  3-class  approach  to  one  capable  of 
identifying  or  verifying  the  fingerprints  of  devices  for  an  arbitrary  N— class  problem. 

An  extensive  empirical  study  was  conducted  and  the  performance  of  RF-DNA 
fingerprinting  was  evaluated  under  a  wide  range  of  simulated  noise  conditions.  Em¬ 
pirical  results  indicate  the  technique  scales  well  for  both  identification  and  verifica¬ 
tion  tasks  involving  40  near-identical  devices.  For  experimentally  collected  emissions, 
the  technique  correctly  identifies  devices  greater  than  99.5%  of  the  time,  with  aver¬ 
age  verification  equal  error  rates  (EERs)  of  less  than  0.05%  achieved  using  a  single 
extracted  fingerprint.  Correct  identification  success  rates  of  better  than  90%  were 
maintained  under  analysis  conditions  of  SNR  >  15  dB. 

The  impressive  performance  indicates  that  RF-DNA  fingerpinting  is  adaptable 
to  less  ideal  conditions  while  still  providing  acceptable  results.  Finally,  these  results 
were  obtained  using  a  single  extracted  fingerprint.  A  substantial  improvement  in 
performance  is  believed  to  be  realizable  through  a  straightforward  extension  of  the 
approach  for  multiple  extracted  fingerprints. 


5.1.2  Leakage  Mapping.  The  second  major  contribution  of  this  work  is 
the  development  and  demonstration  of  a  leakage  mapping  methodology  for  assessing 
the  information  leakage  from  arbitrary  block  cipher  implementations.  Prior  to  this 


work,  SMY09  provided  the  only  proposed  methodology  to  enable  system  evaluators 
to  quantitatively  bound  the  leakage  from  an  evaluated  implementation.  However, 
this  earlier  work  relies  on  the  evaluator’s  ability  to  build  an  optimal  template  attack, 
and  the  end  result  is  limited  in  focus  to  a  small  portion  of  the  overall  cryptographic 
algorithm. 


The  framework  proposed  here  provides  a  comprehensive  approach  to  assess 
the  information  leakage  from  all  algorithmically  specified  key-dependent  intermedi¬ 
ate  computations  for  implementations  of  symmetric  block  ciphers.  The  resulting 
leakage  assessment  quantitatively  bounds  the  resistance  of  an  implementation  to 
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the  general  class  of  differential  side-channel  analysis  (SCA)  techniques,  and  provides 
system  designers  and  evaluators  with  a  tool  that  can  be  used  to  objectively  assess 
whether  countermeasures  implemented  are  justified  given  the  added  cost  in  time, 
space,  and  energy  compared  to  the  obtained  reduction  in  exploitable  information 
leakage.  Furthermore,  the  systematic  approach  enables  evaluators  to  quickly  and 
efficiently  repeat  the  assessment  process  for  different  variations  of  implementations, 
which  helps  to  ensure  the  additions  of  countermeasures  does  not  inadvertently  in¬ 
troduce  new  unexpected  sources  of  information  leakage. 

While  using  this  technique  does  not  guarantee  an  implementation  will  be  secure 
against  all  future  side-channel  attacks,  it  does  provide  an  efficient  mechanism  through 
which  system  designers  and  testers  may  gain  substantial  insight  into  the  level  of 
security  of  a  particular  implementation.  Examination  of  the  leakage  maps  permits 
insights  that  are  not  at  all  obvious  when  testing  is  carried  out  in  a  less  thorough 
manner. 

The  framework  was  demonstrated  using  the  well-known  Hamming  Weight  and 
Hamming  Distance  leakage  models,  with  recommendations  for  extension  of  the  tech¬ 
nique  to  more  accurate  models.  The  approach  was  applied  to  two  typical  unprotected 
implementations  of  the  Advanced  Encryption  Standard  (AES),  and  the  assessment 
results  were  empirically  validated  against  correlation-based  differential  power  and 
electromagnetic  analysis  (DPA/DEMA)  attacks. 

5.2  Recommendations  for  Future  Research 

A  number  of  recommendations  for  future  research  were  made  in  each  article 
in  the  main  body  of  this  work.  Those  recommendations  are  revisited  here  with 
additional  discussion. 

5.2.1  RF-DNA  Fingerprinting  of  ICs.  Although  the  results  of  the  empir¬ 
ical  studies  conducted  thus  far  are  very  promising,  a  considerable  amount  of  work 
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remains  to  fully  understand  the  suitability  of  RF-DNA  fingerprinting  for  practical 
security  implementations.  Intuitively,  the  nature  of  the  intrinsic  characteristics  that 
induce  inter-device  variations  suggests  a  fingerprint  based  on  those  variations  will  be 
extremely  difficult  to  impersonate.  However,  further  analysis  and  experimentation 
are  needed  to  confirm  this.  Particular  areas  for  additional  study  include: 

5. 2. 1.1  Permanence  and  robustness  of  RF-DNA  features  under  varying 
environmental  conditions.  It  is  likely  that  normal  operation  of  an  IC  over  its 
lifespan  will  affect  the  physical  device  structure  and  resulting  RF-DNA  fingerprints. 
More  studies  are  necessary  to  assess  the  sensitivity  of  RF-DNA  fingerprinting  to  such 
physical  changes  over  long  periods  of  time.  If  structural  changes  are  found  to  cause 
the  fingerprint  to  change  significantly  in  a  way  that  adversely  affects  fingerprint  per¬ 
formance,  one  solution  might  be  to  design  the  authentication  system  to  update  the 
training  database  and  reference  fingerprints  each  time  a  device  is  successfully  authen¬ 
ticated.  In  practical  implementations,  uncontrolled  environmental  fluctuations  (e.g., 
temperature  or  supply  voltage)  are  also  expected  to  effect  the  fingerprints.  Previous 
work  has  shown  that  environmentally-induced  fingerprint  variations  for  intentional 
emitters  can  be  compensated  for  effectively  by  conducting  the  enrollment  training 
procedure  over  the  range  of  expected  operating  temperatures  and  voltages  (TSU04|. 
Additional  studies  should  be  conducted  to  assess  the  suitability  of  this  approach  for 
application  to  the  unintentional  emissions  of  ICs. 

5.2. 1.2  Sensitivity  of  fingerprint  performance  to  variations  due  to  dif¬ 
ferent  sensor  modules  or  sensor  positioning.  Another  anticipated  source  of  per¬ 
formance  degradation  is  variations  in  the  physical  sensor  characteristics,  receiver 
components,  and  sensor  positioning  relative  to  the  IC.  The  experiments  conducted 
in  this  work  used  a  single  sensor  and  receiver  module  and  controlled  the  sensor  posi¬ 
tioning  over  each  IC.  In  practice,  each  device  reader  introduces  its  own  unique  effects 
on  the  fingerprint  characteristics.  Thus,  the  fingerprinting  procedure  must  select  fea- 
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tures  that  are  insensitive  to  receiver  and  sensor  induced  variations.  The  sensitivity 
of  RF-DNA  fingerprints  to  such  variations  should  be  studied  further  to  determine  if 
performance  is  still  acceptable  for  different  sensor  and  receiver  modules  under  more 
realistic  operating  conditions.  A  logical  test  case  is  smart-card  based  authentica¬ 
tion  tokens,  where  the  amount  of  variation  in  the  positioning  of  the  smart-card  and 
an  embedded  EM  sensor  in  the  reader  could  be  controlled  in  practical  applications. 
Variations  due  to  card  positioning  in  a  contact  smart-card  reader  could  be  simulated 
very  realistically  using  a  motorized  XY  stage  to  control  probe  positioning. 

5.2. 1.3  Scalability  to  larger  databases.  The  RF-DNA  fingerprinting 
technique  performed  very  well  for  the  40  near-identical  devices  tested  in  this  work 
for  both  verification  and  identification  tasks.  It  is  believed  that  the  MDA  approach 
employed  is  scalable  to  much  large  databases  of  devices,  at  least  for  the  verification 
task.  One  limitation  of  the  MDA  technique  employed  herein  is  that  it  requires 
an  initial  number  of  features  that  exceeds  the  number  of  total  potential  classes. 
For  identification,  the  straightforward  MDA  implementation  will  encounter  practical 
limits  as  the  number  of  classes  exceeds  the  number  of  available  (or  computationally 
feasible)  features.  However,  for  identity  verification  applications  of  IC  fingerprinting, 
many  smaller  databases  can  be  used  to  ensure  a  single  verification  set  does  not 
violate  the  MDA  requirements.  Since  in  an  authentication  scenario  the  claimed 
device  identity  is  known  prior  to  attempting  the  classification,  only  the  sub-database 
containing  the  claimed  identity  need  be  considered.  Investigation  of  this  solution 
could  be  tested  using  simulation  to  reduce  the  manual  workload  and  cost  associated 
with  fingerprinting  a  large  number  of  devices. 

5.2. 1-4  Challenge-response  sequence  optimization.  The  results  herein 
were  obtained  by  arbitrarily  designating  several  clock  cycles  of  an  overall  operation 
sequence  as  the  response  region.  Although  no  statistical  difference  in  performance 
was  observed  during  limited  trials  when  the  designated  response  region  was  varied  to 
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include  different  sub-regions,  additional  performance  improvements  may  be  obtained 
by  more  carefully  choosing  or  defining  the  response.  Future  research  into  this  area 
might  focus  on  a  rigorous  investigation  into  the  performance  of  fingerprints  based 
on  microcode  instruction  sequences  that  exercise  different  device  subcircuitry. 

5.2. 1.5  Effectiveness  for  programmable  or  custom  logic  ICs  such  as  FP- 
GAs  or  ASICs.  This  work  evaluated  the  effectiveness  of  the  RF-DNA  finger¬ 
printing  technique  for  low-cost  commercial  microcontrollers  typical  of  those  used 
in  a  wide- variety  of  modern  security  applications.  However,  the  ICs  tested  were 
fabricated  using  a  180nm  lithography  process.  An  interesting  extension  of  this  work 
would  be  to  conduct  additional  experiments  to  confirm  the  technique’s  suitability  for 
other  classes  of  devices  such  as  FPGAs  or  custom  ASICs,  as  well  as  those  fabricated 
using  more  modern  fabrication  processes  with  much  smaller  features  sizes. 

5.2. 1.6  Performance  improvement  through  use  of  multiple  fingerprints. 

Although  it  is  anticipated  that  achievable  SNRs  for  most  applications  of  this  tech¬ 
nique  will  be  very  high  in  practice  since  the  emissions  are  captured  using  a  near-field 
probe,  it  is  believed  that  identification  and  verification  accuracies  could  be  substan¬ 
tially  improved  over  the  results  herein  by  considering  multiple  extracted  fingerprints 
for  the  classification  decision.  It  would  be  worthwhile  to  examine  how  much  further 
performance  could  be  increased,  particularly  for  degraded  SNR  conditions,  by  either 
averaging  multiple  signals  or  by  extending  the  Bayesian  classification  technique  by 
iteratively  classifying  additional  fingerprints  until  a  desired  confidence  in  the  decision 
is  reached. 


5. 2. 1.7  Investigation  of  alternate  side- channels.  It  is  believed  that 
the  same  techniques  employed  in  Chapter  [3]  would  be  similarly  effective  if  the  RF 
emissions  were  replaced  by  other  side-channel  emissions  such  as  variations  in  power 
consumption.  If  the  power  side-channel  is  found  to  have  similar  performance,  it  may 
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be  more  suitable  in  some  situations  such  as  those  where  precise  probe  positioning  is 
difficult  to  achieve  reliably.  The  experiments  in  Chapter  [3]  could  easily  be  replicated 
for  alternative  side-channels  by  simply  replacing  the  measured  RF  signal  with  the 
variations  in  power  consumption.  Power  consumption  variations  can  be  obtained 
using  a  typical  power-based  side-channel  analysis  setup  where  the  voltage  drop  is 
measured  across  a  resistor  placed  in-line  with  the  circuit’s  supply  and/or  ground  line 
as  illustrated  in  Sec.  EH 

5.2.2  Leakage  Mapping.  Future  work  related  to  leakage  mapping  should 
focus  on  continuing  to  enhance  the  procedure  to  provide  a  more  robust  overall  leak¬ 
age  assessment  for  arbitrary  cryptographic  implementations.  A  number  of  possible 
improvements  to  the  procedure  were  suggested  in  Chapter  |4j  and  are  expanded  on 
here. 


5.2.2. 1  Advanced  Statistical  Techniques.  In  particular,  the  techniques 
should  be  adapted  to  incorporate  more  advanced  statistical  tools  such  as  mutual  in¬ 
formation  or  linear  regression  models  rather  than  depending  on  the  fixed  Hamming 
Weight  or  Hamming  Distance  Models.  The  non-parametric  mutual  information  ap¬ 
proach  is  interesting  because  of  the  potential  to  capture  non-linear  information  leak¬ 
ages,  but  the  computational  complexity  is  high.  Thus,  any  efforts  to  employ  such  an 
approach  would  need  to  focus  on  identifying  efficient  techniques  for  computing  the 
required  non-parametric  statistical  distributions  to  make  the  systematic  application 
of  MIA  feasible.  This  problem  may  be  suitable  for  the  application  of  parallel  pro¬ 
cessing  techniques  using  scientific  high  performance  computing  clusters  or  graphical 
processing  unit  (GPU)  acceleration. 

Linear  regression  based  techniques  (e.g.,  |SLP05|)  are  also  highly  promising 
because  they  can  automatically  adapt  the  leakage  model  to  each  considered  sam¬ 
ple  or  instant  in  time.  Thus,  models  constructed  using  regression  techniques  might 
be  considered  to  be  leakage  agnostic  since  they  can  adaptively  capture  the  statisti- 
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cal  relationship  between  the  observed  signal  variance  at  each  sample  and  the  data 
of  interest.  It  is  believed  that  such  an  approach  would  be  substantially  more  ro¬ 
bust  at  the  expense  of  significantly  increased  computational  effort.  An  assessment 
conducted  using  leakage  mapping  augmented  with  regression-based  models  would 
be  much  more  representative  of  the  types  of  DSCA  attacks  a  persistent,  adaptive 
adversary  might  employ.  Although  the  computational  requirements  would  increase 
significantly,  limited  initial  experiments  suggest  that  a  regression-based  approach  is 
feasible  for  the  leakage  mapping  framework. 


5. 2. 2. 2  Suitability  for  Assessing  Protected  Implementations.  Al¬ 
though  some  pilot  studies  were  conducted  to  evaluate  the  suitability  of  leakage 
mapping  against  protected  cryptographic  implementations,  the  results  so  far  are 
very  limited  due  to  the  limited  availability  of  protected  implementations  for  study. 
To  truly  assess  the  suitability  of  the  leakage  mapping  approach  as  a  tool  for  making 
appropriate  design  decisions  in  the  development  of  a  protected  implementation,  it 
should  be  used  directly  in  that  enviornment.  That  is,  it  should  be  employed  in  an 
iterative  fashion  as  a  cryptographic  implementation  is  designed  and  various  counter¬ 
measures  are  applied  to  determine  how  well  it  guides  the  decision  making  process. 
At  each  step,  it  is  also  recommended  that  the  leakage  mapping  results  be  validated 
against  well-known  attack  techniques  as  in  Chapter  |4j 


5. 2. 2. 3  Frequency  Domain  Leakage  Mapping.  For  this  research,  the 
leakage  mapping  approach  was  applied  exclusively  to  signals  in  the  time  domain. 
However,  various  other  work  has  suggested  that  spectral-domain  techniques  may  be 
even  more  effective,  particularly  against  some  implementations  that  implement  basic 
countermeasures  such  as  random  process  interrupts  |RO04b|.  In  general,  the  only 
change  that  would  be  necessary  to  apply  leakage  mapping  to  signals  in  the  frequency 
or  time-frequency  domains  would  be  input  signal  pre-processing  and  interpretation  of 
the  resulting  correlation  matrices.  Spectrogram  or  similar  pre-processing  techniques 
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would  be  particularly  interesting  since  the  leakage  assessment  produced  from  time- 
frequency  domain  signals  would  identify  both  problematic  instants  in  time  as  well 
as  the  frequency  bands  containing  the  majority  of  the  leaked  information. 


5.2.3  Related  Future  Research  Recommendations.  In  the  course  of  this 
work,  a  large  number  of  experiments  were  conducted  that  led  to  a  variety  of  in¬ 
teresting  observations  that  have  not  yet  been  fully  investigated.  One  observation 
in  particular — discovered  during  development  of  the  leakage  mapping  procedure — 
merits  further  investigation. 


It  was  emphasized  in  Section  |4.4|  that  a  significant  advantage  of  the  systematic 
leakage  mapping  approach  is  that  it  can  help  to  prevent  the  inadvertent  oversight  of 
important  leakages.  Because  leakage  is  assessed  across  all  intermediate  computations 
during  the  entire  cryptographic  algorithm,  any  unexpected  leakages  are  highlighted 
when  the  full  assessment  results  are  reviewed. 


The  software-based  AES  implementation  (Imp.  A)  studied  in  Chapter  [4]  ex¬ 
hibits  such  unexpected  leakages.  For  this  software  implementation,  direct  key  corre¬ 
lation  would  typically  be  expected  twice  in  each  round.  This  is  because  each  round 
key  is  manipulated  during  the  ARK  operation  and  during  on-the-fly  key  scheduling. 
Strangely,  several  bytes  of  the  master  key  exhibit  leakage  during  rounds  other  than 
the  first  round  during  which  the  master  key  also  serves  as  the  round  key.  Based  on 
knowledge  of  the  AES  algorithm,  this  leakage  is  unexpected  since  the  master  key  is 
not  typically  manipulated,  directly  or  indirectly,  at  those  times.  Examination  of  the 
available  source  code  for  this  implementation  confirmed  that  the  master  key  was  not 
manipulated  at  the  times  when  the  unexpected  leakage  occurred. 

The  source  of  the  key  leakages  was  identified  by  estimating  the  clock  cycle 
that  corresponds  to  each  unexpected  leakage  (using  the  MPLAB  PIC  simulator) 
and  determining  which  instructions  are  being  executed  at  those  times.  This  analysis 
indicated  that  the  source  of  the  observed  leakage  is  ARK  operations  that  mix  each 
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round’s  input  with  the  associated  round  key.  Further  investigation  revealed  that 
the  temporary  round  key  is  stored  in  a  block  of  memory  addresses  that  immediately 
preceeds  the  memory  addresses  where  the  master  key  is  stored.  This  suggests  that  the 
content  of  the  master  key  memory  addresses  is  being  leaked  without  ever  explicitly 
accessing  the  addresses  that  contain  the  leaked  information.  The  physical  cause  of 
this  leakage  is  currently  unknown,  and  additional  experimentation  is  necessary  to 
determine  the  root  cause.  Notably,  even  without  identifying  the  cause,  these  leakages 
could  easily  be  avoided  by  adding  a  buffer  around  the  sensitive  data  in  memory  to 
prevent  any  accesses  to  adjacent  memory  locations. 
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Appendix  A.  Side  Channel  Analysis  Countermeasures 


A.l  Constant  Time 


Kocher,  in  his  original  paper  on  timing  attacks,  suggested  a  number  of  possible 
approaches  system  developers  can  take  to  reduce  the  vulnerability  of  their  systems 
to  similar  attacks  |Koc96|.  The  most  obvious  approach  to  counteract  timing  attacks 
is  to  make  all  operations  take  constant  time. 

Realizing  this  approach  has  proven  to  be  difficult  in  practice,  and  some  opera¬ 
tions  that  were  long  thought  to  resist  timing  attacks  (e.g.,  table  lookups)  have  later 
proven  to  be  vulnerable,  as  described  in  |2.5.1[  When  candidates  algorithms  were 
evaluated  for  selection  as  the  new  AES  algorithm,  vulnerability  to  SCA  attacks  was 
one  of  the  criteria  considered  but  table  lookup  based  implementations  were  consid¬ 
ered  to  be  resistant  to  timing  attacks.  It  has  since  been  shown  that,  at  least  on 
general  purpose  microprocessor  implementations,  cache  architectures  make  that  an 


invalid  assumption  as  described  in  Section  2.5.1  Ber05 


Additionally,  as  Kocher  originally  noted,  ensuring  the  intended  output  is  pro¬ 
duced  in  constant  time  is  insufficient  to  prevent  timing  attacks  since  other  measurable 
phenomenon  can  reveal  the  time  taken  by  intermediate  operations  |Koc96j.  Finally, 
constant  time  countermeasures  have  the  undesirable  effect  of  making  all  operations 
take  the  longest  time,  effectively  de-optimizing  the  performance  to  the  lowest  com¬ 
mon  denominator  |Roh06,  MOP07  . 


A. 2  Constant  Power  Countermeasures 

One  way  to  reduce  the  information  signal  produced  through  the  power  con¬ 
sumption  side  channel  is  for  all  operations  to  require  constant  power.  If  operations 
take  precisely  the  same  amount  of  power  without  regard  for  the  data  being  processed, 
then  power  analysis  attacks  are  no  longer  possible  and  the  adversary  must  seek  an 
alternate  (hopefully  less  informative)  source  of  information  leakage.  This  approach, 


143 


unlike  some  of  the  others,  must  be  implemented  at  the  hardware  level,  and  is  there¬ 
fore  not  applicable  to  software-based  implementations  that  run  on  general  purpose 
devices  such  as  microprocessors.  The  term  commonly  used  for  constant-power  logic 
design  styles  is  secure  logic. 


Several  secure  logic  variations  have  been  proposed,  which  fall  into  two  pri¬ 
mary  categories:  transistor-level  |TAV02,BGLT06  ,  and  gate-level  logic  styles  |TV04 


PM05|.  Transistor- level  logic  styles  typically  perform  better  in  the  sense  that  they 


are  more  effective  at  reducing  power  consumption  fluctuations.  However,  they  typi¬ 
cally  can’t  be  implemented  using  existing  commercial  design  flows  and  standard  cell 
libraries.  Therefore,  the  cost  and  time  required  to  develop  systems  based  on  these 
logic  styles  is  much  higher  than  for  a  standard  CMOS  design.  In  contrast,  gate- level 
secure  logic  assembles  compound  constant-power  logic  cells  from  existing  standard 
CMOS  logic  cells  (e.g.,  AND  and  OR  gates).  These  logic  styles  can  generally  be  im¬ 
plemented  using  existing  design  flows,  commercial  tools,  and  standard  cell  libraries 
but  are  less  effective  at  reducing  variations  in  power  consumption.  In  general,  all  of 
the  proposed  secure  logic  styles  sacrifice  power,  size,  and  /  or  performance  to  reduce 
the  side-channel  information  leakage  of  the  circuit’s  power  consumption. 


Both  the  transistor  and  gate-level  logic  styles  are  based  on  the  idea  of  dual 
or  multi-rail  logic.  Figure  |A.1|  illustrates  a  transistor- level  pre-charged  dual-rail 
domino  style  that  achieves  near  constant  power  consumption  profiles  for  all  inputs 
and  outputs. 


The  logic  cell  requires  an  input  pair  of  both  the  true  signal  and  its  complement 
for  each  input.  Likewise,  it  produces  an  output  signal  pair  consisting  of  both  true 
and  complementary  output  values  of  the  logic  function,  where  YJi  represents  the 
output  and  YJ  is  its  complement.  While  the  the  clock  (p  is  low  (or  logical  'O’),  both 
output  signals  are  being  pre-charged  to  low  voltage.  When  the  clock  4>  goes  high, 
the  circuit  evaluates  and  one  (and  only  one)  of  the  two  output  signals  is  asserted. 
During  proper  operation,  both  output  signals  should  never  be  '  1  ’  simultaneously. 
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Done 


Figure  A.l 


Dual  Rail  Domino  Logic  Style  with  Completion  Detection  for  Asyn¬ 
chronous  Design.  |WH05| 


Table  A.l  summarizes  the  states  for  a  dual-rail  domino  logic  cell.  An  example  of  an 
XOR  gate  implemented  in  dual-rail  domino  logic  is  shown  in  Figure  |A72 


Table  A.l  Dual-rail  domino  signal  encoding  |WH05 


sigJi 

sigA 

Meaning 

0 

0 

precharged 

0 

1 

‘O’ 

1 

0 

T 

1 

1 

invalid 

The  idea  behind  the  dual-rail  design  as  a  countermeasure  is  since  the  logic  cell 
always  calculates  both  the  true  and  complementary  outputs,  either  one  half  or  the 
other  of  the  complementary  paths  will  be  exercised  during  any  particular  operation. 
Thus,  the  power  the  logic  cell  draws  should  be  basically  independent  of  the  inputs 
and  output. 

Note  that  the  dual-rail  logic  style  means  that  the  logic  can  signal  completion 
by  tying  both  un-inverted  outputs  to  a  NAND  gate  as  shown  in  Figure  |A.1|  In 
this  manner,  domino  dual-rail  can  be  used  for  asynchronous  logic,  which  has  been 
proposed  as  a  further  countermeasure  to  DSCA  attacks. 
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Figure  A. 2  Dual  Rail  Domino  XOR/XNOR  gate  (without  completion  detection 
explicitly  shown).  |WH05| 

A  limitation  of  this  approach  is  that  routing  can  create  capacitive  imbalances 
in  the  cell,  which  can  bias  the  power  consumption.  Several  specific  variations  on  the 
dual-rail  logic  concept  have  been  proposed  in  the  literature. 


Sense  amplifier  based  logic  (SABL)  [TAV02|.  A  transistor-level  custom  syn¬ 
chronous  logic  style  with  a  single  switching  event  each  clock  cycle,  during  which 
it  “...discharges  and  charges  the  sum  of  all  the  internal  node  capacitances  to¬ 


gether  with  one  of  the  balanced  output  capacitance”  TV04  .  SABL  requires 
the  design  of  an  all-new  custom  cell  library. 


Wave  dynamic  differential  logic  (WDDL)  (TV04  .  A  gate-level  semi-custom 
logic  style  that  combines  logic  gates  from  a  standard  cell  library  into  compound 
secure  gates  with  constant  power  characteristics  similar  to  SABL.  WDDL  can 
be  implemented  for  both  ASIC  and  FPGA-based  designs.  In  general,  WDDL  is 
less  effective  than  SABL  at  hiding  power  consumption  variations  due  to  internal 
computations  and  unbalanced  routing  of  complementary  wires  can  render  the 
countermeasure  less  effective  at  reducing  data-dependent  power  fluctuations. 
The  technique  can  be  integrated  into  an  existing  design  flow  using  commercial 
EDA  tools  and  commercially  available  complementary  CMOS  standard  cell 
libraries  with  some  modifications  (semi-custom  design  flow)  |TV06  . 


Masked  dual-rail  pre-charge  logic  (MDPL)  |PM05|.  Another  gate- level  semi¬ 
custom  logic  style  formed  from  the  combination  of  standard  cells  that  attempts 
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to  overcome  routing  constraints.  MDPL  combines  power  equalization  with 
intermediate  value  masking  (see  Section  A.l[) . 

•  Three-phase  dual  rail  power  (DRP)  logic  (TDPL)  |BGLT06|.  A  transistor- 
level  enhancement  to  SABL  intended  to  provide  constant-power  consumption 
without  restrictive  routing  restraints.  TDPL  also  requires  the  design  of  a 
custom  cell  library. 

Another  area  of  active  research  is  the  efficient  integration  of  full-  or  semi¬ 
custom  logic  styles  into  existing  design  workflows.  Baddam  and  Zwoliniski  proposed 
a  technique  to  effectively  route  circuits  following  the  divided  WDDL  logic  style  using 
backend  duplication  |BZ08  .  Their  technique  is  interesting  because  it  can  reportedly 
be  applied  to  FPGAs  as  well  as  custom  ASIC  designs.  The  authors  presented  ex¬ 
perimental  results  from  a  130nm  technology  FPGA  demonstrating  the  technique’s 
effectiveness. 

Although  hardware-based  secure  logic  styles  are  promising  in  theory,  achieving 
equalized  power  has  proven  to  be  very  difficult  to  do  in  practice.  Suzuki  and  Saeki 
showed  that  slight  variations  in  gate  input  arrival  times  result  in  the  complementary 
paths  switching  at  different  times,  which  results  in  exploitable  power  consumption 
leakage  |  SS06 1 .  Mangard,  Popp  and  Gammel  showed  that  all  of  the  fully  comple¬ 
mentary  CMOS-based  logic  styles  exhibit  glitching  behavior  that  exhibits  similar 
characteristics  [MPG05].  Schaumont  and  Tiri  have  recently  shown  that  the  combi¬ 
nation  of  masking  and  dual-rail  logic  (e.g.  MDPL),  which  was  thought  to  be  one 
of  the  most  promising  hardware  countermeasures  to  prevent  DPA,  is  fundamentally 
insecure  ST07  .  Small  differences  in  the  routing  of  complementary  wire  pairs  leads 


to  imbalanced  capacitances.  Though  these  implementations  are  not  directly  vul¬ 
nerable  to  traditional  DSCA-type  attacks,  the  routing  imbalances  induce  a  bias  in 
the  probability  density  function  of  the  power  consumption  which  can  be  filtered  to 
remove  the  mask,  resulting  in  a  much  lower  DPA  resistance  than  originally  thought. 
The  authors  were  able  to  extract  the  key  from  their  AES  implementation,  originally 
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believed  to  be  highly  resistant  to  standard  DPA  attacks,  with  approximately  2,000 
observations. 


A. 3  Temporal  Desyncronization 

Noise  addition  is  an  intuitive  way  to  cover  up  information  leakage  of  a  circuit. 
The  concept  is  that  if  noise  can  be  increased  enough,  the  attacker  will  be  unable  to 
identify  and  extract  the  underlying  information  signal.  One  way  of  increasing  the 
noise  in  a  side  channel  is  to  desynchronize  the  time  when  sensitive  data  manipulations 
take  place  from  one  operation  to  another.  This  can  be  accomplished  by,  for  instance, 
randomizing  the  clock  period,  the  order  of  operations,  or  the  number  of  cycles  taken 
for  a  microprocessor  instruction  |Koc96j,  Roh06,  MOP07|. 

The  theory  behind  these  randomization  techniques  is  that  DSCA  attacks  are 
sensitive  to  the  temporal  alignment  of  the  side  channel  traces.  Variations  in  the 
start  time  of  individual  traces,  or  variations  in  the  time  when  subsequent  sensitive 
operations  or  data  manipulations  occur  relative  to  that  start  will  introduce  noise 
into  the  corresponding  side-channel  signal.  The  effect  on  DSCA  techniques  is  that 
the  peaks  in  the  correlation  coefficients  (or  whatever  statistical  tool  is  used)  which 
normally  indicate  a  correct  key  are  spread  across  a  number  of  points  in  time.  In 
digital  signal  processing  this  is  known  as  incoherent  averaging  |CCD00|.  If  the  start 
of  traces  are  not  temporally  aligned,  the  adversary  must  either  collect  more  samples 


or  use  sophisticated  signal  processing  techniques  to  align  them  (see  Section  2.7) 


MOP07 


An  alternate  technique  to  achieve  temporal  descynchronization  is  to  implement 
circuits  using  asynchronous  or  self-timed  logic  styles.  Removing  the  clock  as  a  source 
of  information  makes  the  task  of  side-channel  analysis  much  more  difficult  and  has 
the  added  benefit  of  countering  clock-based  fault  attacks  |FMP03|.  In  addition  to 
the  properties  of  constant  power  consumption  that  can  be  achieved  through  care¬ 
ful  logic  design,  the  dual-rail  logic  styles  introduced  above  also  lend  themselves  to 
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asynchronous  design  since  each  cell  inherently  signals  the  completion  of  an  opera¬ 
tion  [FMP03  . 


A. 4  Masking  countermeasures 

Masking  countermeasures  are  a  class  of  techniques  that  protect  sensitive  infor¬ 
mation  from  side  channel  attacks  by  making  the  physically  observable  phenomena 
independent  of  sensitive  data.  Masking  of  asymmetric  ciphers  is  commonly  referred 
to  as  blinding  in  the  cryptographic  literature  |Koc96,MOP07  . 


Kocher  first  proposed  applying  blinding  as  a  countermeasure  to  timing  at¬ 


tacks  Koc96  ,  but  the  technique  is  also  effective  at  reducing  vulnerability  of  systems 
other  side  channel  attacks  |CJRR99,  Roh06,  MOP07|.  Kocher’s  blinding  technique 
for  protecting  the  private  key  is  described  in  detail  in  Appendix  [Dj 

The  generalized  application  of  masking  to  side  channel  attacks  was  proposed 
by  Chari,  et  al.  as  a  derivative  of  the  already  known  secret  sharing  technique  from 


the  cryptographic  literature  CJRR99  .  Masking,  in  general,  is  accomplished  by 
randomizing  sensitive  intermediate  values  processed  by  the  device.  The  idea  is  to 
randomly  split  sensitive  intermediate  values  into  a  number  of  shares,  each  of  which 
is  independent  of  the  original  sensitive  data.  Each  sensitive  unmasked  intermediate 
value,  v ,  is  split  into  d  shares  where  the  relation 


mi  *  m2  *  ■  ■  ■  *  m<i  =  v 


(A.l) 


holds  for  some  group  operation  *.  The  shares  m\ . . .  rnfl- 1  are  the  random  masks, 
and  ma  is  the  masked  intermediate  value.  The  shares  m\ . . .  ma- 1  are  assumed  to 
be  mutually  independent  random  variables  uniformly  distributed  over  v.  Typical 
choices  for  the  masking  operation,  *,  are  the  logical  XOR  ( boolean  masking )  and 
modular  addition  or  multiplication  ( arithmetic  masking )  |MOP07, PRB09|. 
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Masking  does  not  make  physically  observable  phenomena  independent  of  the 
device  operations,  but  rather  randomizes  the  data  on  which  the  device  is  operat¬ 
ing.  Thus,  the  device  still  leaks  information  about  the  intermediate  values  being 
processed,  but  those  intermediate  values  are  independent  of  the  secret  or  private 
key.  An  adversary  with  no  knowledge  that  the  masking  countermeasure  has  been 
implemented  will  be  unable  to  attack  the  device  using  standard  SCA  techniques. 
Even  with  knowledge  that  masking  is  used  the  difficulty  of  mounting  an  attack  is 
significantly  increased. 


Most  of  the  masking  schemes  proposed  use  a  single  random  mask  to  split 
the  sensitive  intermediate  value  into  just  two  shares,  although  higher-order  mask¬ 
ing  schemes  have  also  been  proposed  to  provide  further  security  against  HO-DSCA 
techniques  |CJRR99,MOP07,PRB09|.  The  complexity  of  both  implementing  higher- 
order  masking  schemes  and  attacking  grows  quickly  with  the  order,  so  most  research 
has  focused  on  2nd  order  masking  and  20-DSCA  attacks  jPRB09  . 


Masking  countermeasures  are  sometimes  combined  with  constant-power  logic 
styles,  but  that  combination  has  been  found  to  be  fundamentally  insecure  as  de¬ 
scribed  in  IA.2I 


A. 4-1  Masking  of  complex  ciphers.  A  key  characteristic  of  masking  coun¬ 
termeasures  is  they  require  changes  to  the  underlying  algorithm  to  handle  the  masked 
intermediate  values  properly.  For  RSA,  the  changes  are  straightforward  due  to  the 
properties  of  modular  arithmetic,  but  masking  schemes  can  be  substantially  less 
straightforward  for  more  complicated  ciphers.  For  example,  masking  S-box  lookups 
introduces  significant  overhead  (performance  and  memory)  since  the  masked  equiv¬ 
alent  of  the  entire  table  would  have  to  be  calculated  for  each  new  mask  |MOP07|. 
More  efficient  implementations  are  possible  if  the  S-box  functions  are  computed 
dynamically  using  finite  field  arithmetic  since  the  entire  table  doesn’t  have  to  be 
recomputed  for  each  new  mask  value  [OMPR05  . 
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A. 4. 2  Implementation  levels  for  masking.  In  addition  to  masking  inter¬ 
mediate  values  at  the  algorithmic  level  (in  either  hardware  or  software),  it  is  also 
possible  to  mask  operations  at  higher  system  levels.  Mangard  et  al.  describe  sev¬ 
eral  techniques  for  masking  hardware  multipliers,  combinational  logic,  and  data 
buses  [MOP07].  Various  proposals  have  been  made  for  masked  logic  styles  to  be 
implemented  at  the  custom  hardware  design  level  for  both  ASIC  and  FPGA  imple¬ 
mentations. 


A. 4. 3  Practical  considerations.  The  security  benefits  of  masking  counter¬ 
measures  can  be  easily  lost  if  the  technique  is  implemented  carelessly.  Mangard,  et 
al.  warns  of  several  possible  pitfalls  when  implementing  masking  schemes  in  prac¬ 
tice.  Consecutively  storing  a  masked  value  and  its  corresponding  mask  in  the  same 
register  or  inadvertently  consecutively  transferring  them  across  a  data  bus  may  leak 
the  Hamming  distance  between  the  two  and  reveal  the  unmasked  intermediate  value. 
Likewise,  intermediate  values  that  are  concealed  by  the  same  mask  and  processed 
consecutively  can  leak  the  Hamming  distance  between  the  two  unmasked  interme¬ 
diate  values.  Simultaneously  processing  a  masked  value  intermediate  value  and  the 
mask  used  to  protect  it  can  unintentionally  occur  in  parallel  hardware  implementa¬ 
tions,  effectively  leaking  the  un-masked  intermediate  value.  Finally,  implementation 
tools  (compilers,  hardware  synthesis  tools,  etc.)  may  inadvertently  optimize  away 
countermeasures  if  care  is  not  taken  |MOP07|. 

To  be  effective  random  masks  should  be  changed  frequently  enough  to  prevent 
the  masks  themselves  from  introducing  exploitable  side-channel  leakage.  However, 
every  mask  change  reduces  performance  of  the  system,  and  thus  the  frequency  of 
changes  must  be  balanced  to  achieve  acceptable  performance  and  security  MOP07|. 
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A. 5  Power  Supply  Shielding 


A  countermeasure  to  power  analysis  techniques  is  to  isolate  the  internal  time- 
varying  requirements  of  a  circuit  from  easily  probed  terminals.  Shamir  was  the 
first  to  propose  physically  isolating  a  circuit’s  power  supply  from  a  circuit  by  using 
large  shielding  capacitors  |  ShaOO  | .  Others  have  noted  that  capacitors  used  in  power 
distribution  networks  of  complex  integrated  circuits  such  as  FPGAs  filter  some  of 
the  higher-frequency  activity  of  the  internal  switching  activity  |MOP07|.  Shamir’s 
approach  used  two  capacitors  in  tandem  so  that  one  could  supply  the  circuit  while 
the  other  recharged. 


Ratanpal,  Williams  and  Blalock  pointed  out  that  Shamir’s  technique  can  be 
bypassed  by  probing  the  power  supplied  by  the  capacitors  instead  of  the  global  power 
supply  (assuming  the  capacitors  are  off-chip  or  accessible),  and  reconstructing  a  full 
power  consumption  trace  by  combining  the  power  traces  from  the  two  capacitors 
and  applying  standard  DPA  attack  techniques  against  the  combined  trace  |RWB04|. 
The  authors  proposed  a  more  effective  active  signal  suppression  circuit.  A  key  disad¬ 
vantage  of  all  power  supply  shielding  techniques  is  that  the  are  only  effective  against 
power  analysis  and  do  not  protect  against  attacks  on  the  EM  or  other  side  channels. 


A .  6  EM  Shielding 

There  are  few  specific  countermeasures  that  specifically  address  the  problem 
of  EM  leakage.  Quisquater  et  al.  suggest  several  possible  ways  to  reduce  the  vulner¬ 
ability  of  systems  to  EM  analysis,  including  reducing  the  magnitude  of  the  radiated 
electromagnetic  field  through  shielding  (such  as  thick  metal  packaging),  imprisoning 
the  circuitry  inside  a  Faraday  cage  to  prevent  EM  radiation  from  escaping,  reduc¬ 
ing  the  power  consumption  of  the  device  (and  thus  the  resulting  EM  radiation), 
asynchronous  logic,  or  using  one  of  the  various  dual-rail  power  logic  styles  proposed 


elsewhere  as  a  countermeasure  to  power  analysis  attacks  QS00,QS01  .  Introducing 
temporal  or  spatial  jitter  with  techniques  such  as  the  ones  described  in  Sections 
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|A.3|  and  |A.8|  may  also  improve  resistance  to  EM-based  DSCA  attacks.  The  Fara¬ 
day  cage  approach  is  likely  to  be  effective,  but  may  not  be  realistic  for  practical 
implementations  of  many  systems. 


A. 7  Leakage- Resistant  Arithmetic 


Bajard,  et  al.  proposed  a  leakage-resistant  arithmetic  (LRA)  style  based 
residue  number  system  (RNS)  as  a  countermeasure  to  timing,  SSCA  and  DSCA 


attacks  BILT04  In  the  RNS  representation,  large  integers  are  represented  by  a 
series  of  smaller  integers.  The  system  is  defined  by  a  set  of  relatively  prime  moduli , 
{mi,  m2,  •  •  •  ,mnr}.  Any  arbitrary  integer,  X,  is  represented  as  a  set  of  N  smaller 
integers  {x\,X2,  ■  ■  ■  ,  xyv}  such  that  Xi  =  X  mod  m*.  The  LRA  system  randomizes 
data,  order  of  computations,  and  the  function  of  individual  logic  cells  by  randomly 
selecting  the  initial  set  of  moduli  used  and/or  randomly  changing  the  base  moduli 
before  and  during  an  arithmetic  operation. 


The  countermeasure  is  most  suitable  for  public-key  ciphers  that  operate  over 
large  finite  fields  such  as  RSA  or  ECC.  The  authors  developed  a  reference  implemen¬ 
tation  of  a  reconfigurable  logic  design  to  implement  LRA  for  the  RSA  encryption 
algorithm.  The  resulting  implementation  takes  approximately  5-7  times  more  gates 
than  a  standard  modular  exponentiation  implementation,  but  the  inherently  paral¬ 
lel  nature  of  the  arithmetic  operations  offers  potential  performance  improvements 
at  larger  key  sizes  (>  2048  bits).  Although  the  approach  presents  some  intuitive 
degree  of  protection  against  SCA  attacks,  no  experimental  results  were  presented  to 
validate  the  claim  of  improved  SCA  resistance. 


A. 8  Dynamic  Reconfiguration 

One  interesting  new  area  of  countermeasures  research  is  the  idea  of  dynami¬ 
cally  reconfiguring  FPGAs  or  other  devices  in  real-time,  as  introduced  by  Mentens 
et  al.  |MGV08  .  The  countermeasure  creates  temporal  and  spatial  jitter  by  ran- 
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domly  inserting  delay  registers  into  the  computation  path  and  changing  the  location 
where  the  logic  that  processes  sensitive  operations  is  instantiated.  The  temporal 
jitter  descynchronizes  the  timing  of  sensitive  operations  as  previously  discussed  in 
Section  |A.3|  Spatial  jitter  increase  resistance  to  EM  attacks,  which  typically  at¬ 
tempt  to  localize  the  area  of  maximal  EM  leakage.  Fixed  configuration  spatial  jitter 
was  also  previously  introduced  by  Bajard,  et  al.  and  Ciet  et  al.,  but  requires  multi¬ 
ple  dedicated  instantiations  of  the  elementary  computational  cells,  which  increases 
overhead  (BILT04,  CNPQ03|.  Dynamic  reconfiguration  may  achieve  similar  spatial 
protection  with  improved  area  and  resource  utilization  efficiency. 
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Appendix  B.  Kocher’s  Timing  Attack  on  RSA 

B.  1  Overview 

The  seminal  paper  on  cryptanalytic  SCA  attacks,  published  by  Kocher  in  1996, 
targets  the  timing  variations  of  several  asymmetric  ciphers  including  the  popular 
RSA  algorithm.  This  section  describes  Kocher’s  attack  on  RSA. 


B.2  Overview  of  RSA 

RSA  use  is  widespread  because  of  its  strong  computational  security  and  ease  of 
implementation  |Sch96|.  Asymmetric  ciphers  such  as  RSA  are  typically  much  slower 
than  block  ciphers  such  as  DES  or  AES,  but  the  key  lengths  are  typically  much 
larger — 1024  bits  or  higher  for  RSA  vs.  256  bits  maximum  for  AES.  Because  they 
are  relatively  slow,  asymmetric  ciphers  are  not  generally  used  for  sustained  transfers 
of  large  amounts  of  data,  but  are  commonly  used  for  applications  such  establishing  a 
secure  session  between  two  parties  over  a  computer  network  (e.g.,  the  Secure  Sockets 
Layer  (SSL)  protocol  used  for  secure  Internet  browser  sessions)  or  for  encrypting  or 
signing  emails. 

In  RSA  and  other  asymmetric  ciphers,  the  ciphering  process  is  based  on  the 
mathematical  properties  of  modular  exponentiation: 


R  =  dk  mod  m  (B.l) 

where  d  is  the  data  being  operated  on,  m  is  a  publicly  known  modulus,  and  k  is  the 
cryptographic  key. 

In  RSA  and  other  asymmetric  ciphers,  cryptographic  keys  are  generated  in 
pairs — a  public  key  kpu^  which  is  used  to  encrypt  data  and  a  corresponding  private 
key  kpri  which  can  decrypt  the  data  that  was  encrypted  using  fcpub-  A  public  key  is 
published  as  the  set  (kpu b,  m)  and  is  shared  freely  with  anyone  from  whom  the  key’s 
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owner  would  like  to  be  able  to  receive  secure  messages.  Asymmetric  ciphers  used  in 


this  manner  are  typically  referred  to  as  public- key  encryption  schemes  |Sch96|. 

The  mathematical  details  of  how  RSA  keys  are  generated  how/ why  the  ci¬ 
phering  process  works  are  irrelevant  to  Kocher’s  timing  attack  and  therefore  are  not 
discussed  here,  but  are  described  in  detail  by  a  number  of  references  |Sch96j.  It  is 
noteworthy  that  many  of  the  RSA  implementations  in  use  today  are  based  largely 
on  the  ‘C’  source  code  from  the  original  RSA  laboratories  reference  implementa¬ 
tion  |Lab94  . 


B.2.1  Implementation  of  RSA.  In  practice,  the  modular  exponentiation 
operations  used  in  RSA  and  other  similar  ciphers  are  generally  implemented  using 
one  of  several  well-known  techniques.  One  common  algorithm,  which  is  used  as  an 
example  in  Kocher’s  paper,  is  the  binary  square  and  multiply  method  (also  known 
addition  chaining)  shown  in  Algorithm  [4]  |Koc96,Sch96  . 


as 


Algorithm  4  Square  and  Multiply  Modular  Exponentiation — Adapted  from  Koc96 


So  4—  1 

for  i  =  0  to  w  —  1  do 
if  ki  =  1  then 

Ri  <—  ( Si  x  y)  mod  n 

else 

Ri  si 

end  if 

Sj_|_i  Rf  mod  n 

end  forreturn  (Rw- 1) 


B.2.2  Information  Leakage  Modular  Exponentiation.  There  are  at  least 
two  sources  of  potential  information  leakage  in  this  algorithm  |Koc  96 1 .  The  first, 
which  is  the  focus  of  Kocher’s  timing  attack,  is  the  conditional  execution  of  the  the 
modular  multiplication  on  Line  4.  The  multiplication  is  executed  only  if  A  =  1 .  This 
data  dependency  results  in  variations  in  computation  time — the  effect  of  which  is  to 
leak  information  about  the  value  of  the  key.  Kocher’s  attack  is  based  on  analysis  of 
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this  first  source  of  information  leakage.  A  second  source  of  information  leakage  is 
due  to  variations  within  the  modular  multiplication  step  itself. 

B.3  Key  Assumptions  of  Kocher’s  Attack 

With  all  side  channel  attacks,  it  is  important  to  be  aware  of  the  underlying 
assumptions  that  can  limit  the  attack’s  practicality.  Kocher’s  timing  attack  makes  a 
number  of  assumptions  about  the  attacker’s  capabilities,  which  are  described  here: 

1.  The  attacker  (Eve)  has  the  ability  to  monitor  the  decryption  operations  of  her 
target  (Bob),  and  to  accurately  record  the  computation  time  for  each  decryp¬ 
tion. 

2.  Eve  is  able  to  eavesdrop  on  and  capture  the  encrypted  messages  that  are  sent 
to  Bob.  (Alternatively,  in  some  scenarios  an  active  adversary  could  actually 
be  the  source  of  the  encrypted  messages.) 

3.  Eve  is  able  to  do  the  above  for  a  large  number  of  decryption  operations  where 
Bob  is  using  the  same  private  key. 

4.  Eve  must  have  knowledge  of  the  amount  of  time  required  by  Bob  to  perform 
the  modular  multiplication  that  occurs  when  a  key  bit  is  ‘1’.  One  way  for 
Eve  to  accomplish  this  is  to  create  a  simulator  using  the  same  source  code 
as  the  targeted  implementation,  running  on  a  similar  hardware  platform.  She 
can  then  precisely  simulate  the  amount  of  time  each  partial  calculation  should 
have  taken  on  her  target  platform])] 

1  Realistically,  this  is  not  an  impractical  constraint  from  the  perspective  of  an  attacker.  A 
scenario  where  Eve  may  have  this  capability  is  if  she  is  attacking  one  of  the  many  widely  used  RSA 
implementations  that  are  based  on  the  original  RSAREF  reference  implementation  (which  is  freely 
available)  or  use  open-source  implementations  of  other  cryptographic  libraries. 
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5.  Measurement  error,  loop  overhead,  and  other  computation  time  contributions 
not  directly  attributable  to  the  mathematical  operations  of  modular  exponen¬ 
tiation  are  negligible  0 

6.  Modular  multiplication  computation  times  are  approximately  normally  dis¬ 
tributed.  Kocher  illustrates  that  this  is  the  case  for  his  targeted  implementation 
(RSAREF  toolkit  running  on  120MHz  Pentium  computer,  MSDOS  operating 
system)  |Koc96  . 

7.  Modular  multiplication  computations  times  are  independently  distributed.  In 
fact,  it  has  been  shown  that  there  is  some  correlation  between  multiplication 
times  in  a  straightforward  square  and  multiply  implementation,  but  the  ap¬ 
proximations  used  for  this  attack  are  still  useful  in  practice  |Roh06  . 

B.4  The  Attack 

The  actual  attack  is  an  iterative  process  whereby  Eve  attempts  to  guess  the 
bits  ki  of  the  full  re-bit  private  key  in  the  order  they  are  used  by  the  modular 
exponentiation  algorithm.  The  basic  steps  of  Kocher’s  timing  attack  are: 

1.  Eve  observes  Bob’s  decryption  of  n  cipher-texts. 

2If  these  factors  are  not  negligible,  it  does  not  render  Kocher’s  attack  ineffective,  but  does 
increase  the  noise.  The  result  is  that  the  attacker  must  gather  more  samples  in  order  to  average 
out  the  underlying  noise. 
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2.  For  each  decryption  operation  j  =  1,  2, n,  Eve  records  the  total  computation 
time,  Tj.  The  total  computation  time  for  each  individual  observation  can  be 
decomposed  as: 


/  Tl  \ 


W—  1 


Tj  —  Uj  +  — 


i= 0 


v Tn  y 


^  io,i  +  tiA  +  *  *  •  +  tw~ i?i  +  ei  ^ 

to, 2  +  tl,2  +  *  '  •  +  tw~\  2  +  ^2 


y  to,n  +  tl.n  +  *  *  *  +  tw-l,n  +  J 


(B.2) 


where  is  the  time  attributable  to  pass  i  through  the  square  and  multiply  loop 
for  observation  j  and  e  represents  the  remaining  timing  and  error  components 
including  loop  overhead  and  measurement  error.  Each  thJ  depends  on  the 
value  of  the  key  bit  hi  and  the  value  of  the  cipher-text  dj.  Initially,  all  tt  J  are 
unknown. 

3.  Beginning  with  the  LSB  (ko),  Eve  iteratively  considers  each  bit  ki  of  the  key. 
For  subsequent  iterations,  it  is  assumed  that  all  previous  key  bits  (fcj_i..fc0)  are 
known.  Eve  makes  two  hypotheses  about  ki ,  namely 


H0  :  ki  =  0,  Hi\ki  =  l 

Given  her  knowledge  of  the  implementation,  Eve  simulates  the  partial  compu¬ 
tation  for  the  first  i  key  bits  and  determines  the  time  X!q=o  attributable  to 
that  portion  of  the  modular  exponentiation  for  each  observation.  The  remain¬ 
ing  time,  which  is  attributable  to  the  unknown  key  bits  for  which  Eve  cannot 
yet  simulate  the  computation  is: 

W  —  l  2  —  1 

Tf'm=e  + (B.3) 

p=  0  q= 0 
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The  result  of  this  step  is  two  new  lists  of  adjusted  timings  Tf  0  and  Tf 1  for 
all  observations  (j  =  1,2,..., n). 

4.  Eve  then  calculates  the  variance  for  Tf°  and  Tf 1  from  the  previous  step. 
Whichever  hypothesis  (H0  :  k,  =  0  or  1/j  :  k%  =  1)  resulted  in  the  largest 
reduction  in  the  variance  over  all  adjusted  computation  times  is  chosen  as  the 
next  key  bit. 

5.  Eve  repeats  steps  3-4  for  each  bit  of  the  key  until  the  entire  key  is  known. 


Under  the  assumptions  of  independence  and  normal  distribution  of  multipli¬ 
cation  times,  Kocher  derived  the  probability  that  a  wrong  hypothesis  will  result 
in  a  greater  reduction  in  variance  of  the  adjusted  computation  times.  When  this 
does  occur,  the  variance  calculated  in  subsequent  steps  will  actually  begin  to  grow, 
indicating  that  an  incorrect  key  bit  was  selected  and  that  some  backtracking  is  re¬ 
quired  to  correct  the  error  |Koc96,Roh06|.  Kocher  refers  to  this  as  an  error- detection 
property  of  the  timing  attack. 


In  the  general  case,  the  probability  at  each  step  of  selecting  the  correct  hy¬ 
pothesis  for  the  current  key  bit  (given  all  previous  key  bits  were  chosen  correctly)  is 
given: 


=  P[Ki  =  0|  ( Var  (if0)  <  Var  (if1))] 
=  P[Ki  =  1|  ( Var  (Tf1)  <  Var  (if0))] 


=  $ 


n(i  —  c ) 


2  (w  —  i) 

where  c  is  the  index  of  the  first  incorrectly  guessed  key  bit. 


Thus,  Eve  can  make  trades  between  the  number  of  observations  and  the  post¬ 
processing  time  (including  backtracking  for  wrong  guesses)  to  optimize  the  efficiency 
of  her  attack. 
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Appendix  C.  Kocher’s  SPA  Attack  on  DES 


Kocher’s  original  SPA  attack  illustrates  a  typical  application  of  SSCA.  The  principle 
of  the  analysis  is  that  the  power  consumption  of  a  micro-controller  looks  different 
for  different  operations.  For  instance,  the  side  channel  leakage  may  look  significantly 
different  when  a  conditional  branch  is  taken  compared  to  when  the  branch  is  not 
taken.  Figure  |C.1|  shows  the  current  drawn  by  a  micro-controller-based  DES  im¬ 
plementation  for  two  sets  of  operations  over  a  period  of  seven  clock  cycles  [KJJ99]. 
Close  examination  of  the  traces  reveals  that  they  have  very  similar  (nearly  identical) 
current  profiles  until  they  reach  clock  cycles  6-7.  Clock  cycles  6-7  correspond  to  the 
micro-controller’s  execution  of  a  conditional  branch  instruction.  In  one  of  the  traces 
the  branch  is  taken,  and  in  the  other  trace  the  branch  is  not  taken. 


Time  (in  3.571 4MHz  clock  cycles) 

Figure  C.l  Variations  in  Current  Drawn  Due  to  Different  Conditional  Branch  De¬ 
cisions  (Adapted  from  |KJJ99]) 


In  this  example,  the  device  being  analyzed  is  a  micro-controller  running  DES 
code,  and  the  differences  in  current  drawn  by  the  device  occur  at  times  when  the 
micro-controller  is  making  a  conditional  branch  decisions  based  on  a  portion  (one 
bit  in  this  case)  of  the  cryptographic  key.  By  applying  publicly  available  knowledge 
of  the  DES  algorithm  and  domain  knowledge  of  how  the  algorithm  would  be  imple- 
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merited  in  practice,  it  is  possible  to  deduce  the  value  of  the  cryptographic  key  from 
these  differences. 

In  Kocher’s  example,  the  target  is  a  (software,  microprocessor-based)  smart- 
card  implementation  of  the  Data  Encryption  Standard  (DES). 

Listing  C.l  Assembler  Code  for  DES  Conditional  Branch  [?] 


M_SHIFT_C  MACRO 


CLR 

MOV 

RLC 

MOV 

MOV 

RLC 

MOV 

MOV 

RLC 

MOV 

MOV 

RLC 

MOV 

JC 

CLR 

JMP 

M_SHIFTC1 : 

SETB 

M.SHIFTC2 : 


C 

A,  PB_4_C 
A 

PB_4_C  ,  A 
A,  PB_3_C 
A 

PB_3_C  ,  A 
A,  PB_2_C 
A 

PB_2_C  ,  A 
A,  PB_f _C 
A 

PB_1_C  ,  A 
M_SHIFTCI 
PB_4_C .  4 
M.SHIFTC2 

PB_4_C  .  4  ,  #1 
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ENDM 


The  attack  begins  by  analyzing  the  DES  algorithm  to  identify  some  intermedi¬ 
ate  value  that,  if  known,  would  reveal  a  portion  of  the  key.  For  DES,  examining  the 
publicly  available  specification  reveals  that  the  generation  of  the  sub-key  used  in  each 
round  of  ciphering  requires  the  rotation  of  two  28-bit  key  registers.  Applying  domain 
knowledge  of  how  a  rotation  is  likely  to  be  implemented  for  a  micro-controller-based 
implementation,  it  is  expected  that  most  implementations  will  optimize  the  28-bit  ro¬ 
tation  operation  and  only  copy  the  most-significant  bit  (MSB)  to  the  least-significant 
bit  (LSB)  if  it  is  a  ‘  1  ’ .  Otherwise,  the  code  will  probably  perform  a  less  costly  shift 
operation — which  would  automatically  fill  the  LSB  with  a  ‘O’  value.  Thus,  it  is 
anticipated  that  straightforward  implementations  of  DES  will  make  a  conditional 
branch  decision  based  on  the  MSB  of  the  key  register  before  each  rotation. 


For  the  micro-controller-based  smart-card  device  targeted,  the  power  trace 
data  looks  significantly  different  for  operations  where  a  conditional  branch  is  taken 
vs.  operations  where  it  is  not  taken  (Figure  |CTT).  Therefore,  it  is  possible  to  tell 
whether  the  MSB  of  the  key  register  is  a  ‘  1  ’  or  a  ‘O’  by  examining  the  power  trace 
for  the  moment  when  the  conditional  decision  is  made. 


Furthermore,  according  to  the  Federal  Information  Processing  Standard  46- 


3  Nat99  ,  a  total  of  28  left  rotations  of  each  28-bit  key  register  are  used  in  the  DES 
algorithm.  Therefore,  every  bit  in  the  56-bit  DES  key  is  subject  to  the  above  analysis 
over  the  course  of  a  full  16  round  ciphering  process,  permitting  extraction  of  the  full 
secret  key  by  analyzing  whether  or  not  the  jumps  were  taken. 


A  key  point  of  SPA  is  that  an  attacker  requires  substantial  knowledge  of  the 
algorithm  being  implemented  in  order  to  effectively  use  the  technique.  Since  the 
detailed  specifications  of  many  encryption  algorithms  are  publicly  available,  and  in 
most  cases  the  manufacturers  of  encryption  devices  advertise  the  type  of  encryption 
being  used,  this  information  is  frequently  easy  to  obtain.  Kocher’s  second  type 
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of  attack,  DPA,  and  improvements  made  later  are  not  as  dependent  on  detailed 
knowledge  of  the  implementation. 

The  series  of  operations  for  both  traces  include  an  instruction  that  rotates  the 
contents  of  a  register  left  by  one  bit.  If  the  most  significant  bit  in  the  register  (before 
the  rotation)  is  a  '  1  ’  then  a  special  carry  register  is  set  to  ‘lb  If  the  MSB  is  a  ‘O’ 
then  the  carry  register  is  set  to  ‘Ob 

A  branch  decision  is  then  made  based  on  the  value  of  the  carry  register.  If 
the  carry  register  is  set,  the  branch  is  taken.  Likewise,  if  the  carry  register  is  not 
set,  the  branch  is  not  taken.  The  two  traces  depict  the  current  drawn  by  the  device 
for  each  case  (branch  taken  vs.  branch  not  taken).  Close  examination  of  the  traces 
reveals  that  both  have  almost  identical  current  profiles  until  they  reach  clock  cycles 
6-7,  which  corresponds  to  the  location  of  the  conditional  branch  operation.  Thus, 
by  visually  examining  the  power  consumption  side  channel  data,  it  is  possible  to 
determine  whether  or  not  a  conditional  branch  was  taken  or  not — and  in  doing  so, 
to  infer  the  value  of  the  manipulated  register’s  MSB. 
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Appendix  D.  RSA  Message  Blinding 

To  protect  the  private  key  kpri  during  RSA  decryption  one  option  (known  as  message 
blinding )  is  to  choose  a  random  pair  of  numbers  such  that 

m/_1  =  mpri  mod  n  (D.l) 

where  n  is  used  to  represent  the  public  modulus  to  prevent  confusion  with  the  mask 
value  m.  To  do  this,  nif  can  initially  be  chosen  as  a  random  number  that  is  relatively 
prime  to  n.  Then, 


rrij  =  ( m,f  l)kPub  mod  n.  (D.2) 

The  input  message  (the  cipher-text,  C )  is  then  masked  prior  to  computing  the  RSA 
modular  exponentiation  by  first  multiplying  it  by  mod  n  or 

Cm  =  C  x  nii  mod  n.  (D.3) 

Decryption  is  performed  as  normal  to  recover  the  plain-text 

Pm  =  (Cm)kpri  mod  n.  (D.4) 

However,  since  the  decryption  was  performed  on  the  masked  cipher-text,  the  resulting 
plain-text  is  also  masked.  To  recover  the  original  plain-text,  the  masked  plain-text 
is  multiplied  by  m/ 


P  Pm  ^  TYl g. 


(D.5) 
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Appendix  E.  AES  Object  Source  Code 

This  appendix  contains  the  source  code  for  a  Matlab  AES  implementation  that  will 
pre-compute  and  preserve  all  FIPS  197  specified  algorithmically  specified  interme¬ 
diate  results.  This  source  code  supports  all  FIPS  197  approved  variants  of  AES, 
including  all  key  sizes  (128,  192,  and  256-bit)  and  modes  (encryption  and  decryp¬ 
tion).  The  supported  parameters  are  documented  in  AESObject.m. 


Listing  E.l  AESObject.m 

7.  . 

°/0  =  =  =  =  =  =  =  =  AES  Object 

'/.  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  = 

7. 

7,  Created:  Jun  2010 
70  By:  Maj  Will  Cobb 

7o  Last  Modified:  28  Jul  2011 
70  By:  Cobb 


7o  Implements  FIPS  197  Advanced  Encryption  Standard  (AES).  Supports  all 
7o  FIPS  197  specified  key  sizes  and  operation  modes. 

7. 

7.  This  class  will  compute  the  output  and  all  algor  ithmicly  specified  (per 
7.  FIPS  197  specification)  intermediate  values  of  the  AES  encryption  / 

70  decryption  given  a  specific  input  (plain  or  ciphertext  depending  on 

7.  mode)  and  key. 

7. 

70  Modified: 

7. 

70  19  Jul  10  --  made  drastic  improvements  to  speed  by 

7.  going  to  a  lookup-table  based  approach  for  AES  polynomial 

7.  multiplication. 

7. 

70  14  Sep  10  --  Updated  comments  ,  prettied  up  hex  output  display 

7o 

70  28  Jul  11  --  Moved  lookup  tables  to  separate  .mat  file  to  improve 

7.  code  readability. 

7. 

70  18  Aug  11  --  Added  get  .  HexKey S chedul e  &  get  .  BinKeySchedule  . 

7. 

7o  Examples  using  Test  Vectors  from  FIPS  197  Appendix  C: 
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32  °/0 

33  °/0  myAES  =  AESObject; 

34  °/0  myAES  .  Hexlnput  =  5  001 12233445566778899  aabbccddeeff  5  ; 

35  °/0  myAES. HexKey  =  5  000 102030405060708090  aObOcOdOeOf  5  ; 

36  °/0  myAES  .  Encrypt  ; 

37  °/0  myAES  .  HexOutput 

38  °/0 

39  °/0  ans  = 

40  °/0 

41  °/0  69C4E0D86A7B0430D8CDB78070B4C5  5A 

42  °/0 

43  °/0  myAES  .  Hexlnput  =  myAES  .  HexOutput  ; 

44  °/0  myAES  .  De crypt  ; 

45  °/0  myAES  .  HexOutput 

46  °/0 

47  °/0  ans  = 

48  °/0 

49  °/0  00 112233445566778899  AABBCCDDEEFF 

50  °/0 

51  %  %  Get  Key  Schedule  : 

52  °/0  myAES  .  w 

53  °/0 

54  °/0  ans  = 

55  °/0 

56  °/0  [0,1,2,3;4,5,6,7;8,9,10,11;12,13,14,15;214,170,116 ,253;  210 ,175,114 ,250; 

57  °/0  218 , 166 , 120 ,241 ;  214 , 171  ,  118 ,254;  182 , 146 ,207 , 1 1 ;  100 , 61  ,  189 , 241 ; 

58  %  190 , 155 , 197 ,0;  104 ,48 , 179 ,254;  182 ,255 , 116, 78  ;210, 194, 20 1,191; 

59  °/0  108 ,89 , 12 , 191  ;4 , 105 , 191  ,  65;  71 , 247 ,247 ,188;  149 ,53,62 ,3;  249 ,108,50,188; 

60  °/0  253 ,5,141 ,253;  60 ,170,163 ,232;  169,159,157 ,235;  80 ,243 ,175,87; 

61  °/0  173 ,246 ,34 , 170;  94 ,57 , 15 , 125;  247 , 166 , 146 , 150  ;  167 , 85 , 61  ,  193  ; 

62  °/0  10 , 163 ,31 , 107;  20 ,249 ,112 ,26;  227 ,95 ,226 , 140;  68 , 10 ,223 , 77  ;  78 , 169 , 192 , 38  ; 

63  °/0  71  ,  67 , 135 ,53;  164 ,28 , 101  ,  185;  2  24 ,22 , 186 , 244;  174,191,122,210; 

64  °/0  84,153,50 ,209;  240 , 133 , 87 , 104  ;  16 , 147 ,237 , 156  ;  190 , 44 , 151  ,  78  ;  °/0 

65  %  19,17,29, 127;  227 , 148 ,74 ,23;  243 , 7 , 167 , 139  ;  77 , 43 , 48 , 197  ;  ] 

66  °/0 

67  °/0  °/0  Get  hex  representation  of  key  schedule  (1  round  key  /  row ) 

68  °/0  myAES  .  HexKeySchedule 

69  °/0 

70  %  ans  = 

71  °/0 

72  °/0  000102030405060708090  AOBOCODOEOF 

73  °/0  D6AA74FDD2AF72FADAA678F1D6AB76FE 

74  °/0  B692CF0B643DBDF1BE9BC5006830B3FE 
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75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 

111 

112 

113 

114 

115 

116 

117 


°/o  B6FF744ED2C2C9BF6C590CBF0469BF41 

°/0  47F7F7BC95353E03F96C32BCFD0  58DFD 

°/0  3CAAA3E8A99F9DEB50F3AF57ADF622AA 

°/0  5E390F7DF7A69296A7553DC10AA31F6B 

°/0  14F9701AE35FE28C440ADF4D4EA9C026 

7,  47438735  A41C65B9E016BAF4AEBF7AD2 

°/0  549932  D1F08557  68 1093ED9CBE2C974E 

°/0  13111D7FE3944A17F307A78B4D2B30C5 

7o 

°/0  °/0  Get  intermediate  values  : 

°/0  AES_ Int ermediat e s  =  myAES  .  IV 

7. 

7o  an  s  = 


7o 

7o 

7. 

7. 

7. 

7. 

7o 

7o 

7. 

7. 


temp:  [70x4  double] 

S:  [41x4x4  double] 

7.  Note  that  temp  is  the  intermediates  calculated  in  key 
7.  schedule  generation 

7.  S  is  the  4x4  AES  state  array  at  each  intermediate  point  in 
7o  the  calculation.  The  first  dimension  is  the  designator 
7.  for  the  intermediate  value  in  order  of  computation.  Size  of 
7.  this  dimension  will  determine  on  AES  key  size  (128,192,256) 


classdef  AESObject  <  hgsetget 


properties (Constant) 


cENCRYPT.MODE  =  1; 
cDECRYPT.MODE  =  0; 

Nb  =  4; 

binLookup  =  de2bi(0:255,  ’left-msb5); 


properties 

KeyOb j ectHandle ; 

IV  =  {}; 

7.  ***  Default  input  /  keys  are  FIPS  197  test  vectors  *** 
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118 

Hexlnput  =  { 5 32 5  5 88 5 

5  3 1 

119 

’  43  ’  5  5a  5 

5  3 1 

120 

5  f 6  5  ’  30  ’ 

5  98 

121 

’  a8  ’  5  8d  5 

5  a2 

122 

123 

HexKey  =  { 5 2b 5  ’7e’ 

5  15 

124 

’  ab  ’  5  f  7  5 

5  15 

125  °/0 

HexKey  =  {’8e’  ’73’ 

5  bO  5 

126  °/0 

’  c8  ’  5  10  5  5 

f  3  5 

127  °/0 

’  62  ’  5  f 8  5  5 

ea  5 

128  °/0 

HexKey  =  {’60’  ’3d’ 

5  eb  5 

129  °/0 

’  2b  ’  5  73  5 

5  ae  5 

130  °/0 

’If’  ’  35  ’ 

5  2c  5 

131  °/0 

’ 2d  ’  ’  98  ’ 

5 10  5 

132 

133 

BinOutput  =  logical (zeros  (: 

134 

DecOutput  =  uint8 (zeros  (1  , 

135 

136 

°/0  S  is  current  AES  State  m; 

137 

S  =  zeros  (  4 ,  4)  ; 

138 

139 

°/0  These  lookup  tables 

are 

140 

RotWord  =  []  ; 

141 

P  =  []  ; 

142 

ShiftRows  =  [] ; 

143 

InvShiftRows  =  []  ; 

144 

SBox  =  []  ; 

145 

InvSBox  =  []  ; 

146 

RCon  =  []  ; 

147 

PM  =  []  ; 

148 

149 

end  ; 

150 

151 

properties (Dependent  =  true) 

152 

Nk ;  °/0  Valid  values  are 

4,  < 

153 

Nr;  °/0  Valid  values  are 

10  , 

154 

HexOutput ; 

155 

Intlnput ; 

156 

IntOutput ; 

157 

IntKey ; 

158 

HexKeySchedule ; 

159 

BinKeySchedule  ; 

160 

Mode  ; 

5  eO  ’  ; 

5  37  5  ; 

5  07  5  ; 

5  34  5  } ; 


5  16  ; 

to 

00 

’  ’  ae 

’  ’  d2 

’  ’  a6  ’ 

00 

00 

’  5  09 

’  ’  cf 

’  ’  4f 

’  ’3c’} 

5  f  7  5 

5  da  5 

’0e  ’ 

’64  ’ 

’52  ’  . 

2b  5 

’80  5 

’90  ’ 

’79  ’ 

’  e5  ’ 

d2  5 

’52  ’ 

’2c  ’ 

’6b  ’ 

’7b’}; 

5  10  5 

’  15  ’ 

’  ca  ’ 

’71  ’ 

’be  ’ 

5f0  5 

’85  ’ 

’  7d  ’ 

’77  ’ 

’81  ’ 

5  07  5 

’3b  ’ 

’61  ’ 

’08  ’ 

’  d7  ’ 

5  a3  5 

’09’ 

’  14  ’ 

’df  ’ 

’  f  4  ’  }  ; 

,  64)); 
8))  ; 

matrix 


loaded  from  a  pre-computed  .mat  file 


6,  8  for  AES-128,  192,  256 

,  12,  14  for  AES-128,  192,  256 
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161 

162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 

181 

182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 

196 

197 

198 

199 

200 

201 

202 


end  ; 


propert ies ( Access  =  private) 
priMode  =  1; 

priKeyScheduleValid  =  0; 
priKeySize  =  128; 
priKeySchedule  ; 

end  ; 


function  UpdateOutput (a) 

if  (a. Mode  ==  a . cENCRYPT.MODE ) 
a . Encrypt  ; 

elseif  (a. Mode  ==  a . cDECRYPT.MODE ) 
a. Decrypt  ; 

else 

error  (  5  Inval  id  AES  Mode  Specified.  Must  be  1  (encrypt)  or 
decrypt )  5  )  ; 

end  ; 

end  ; 

function  Nk  =  get.Nk(a) 

Nk  =  a. priKeySize  /  32; 

end  ; 

function  Nr  =  get.Nr(a) 

Nr  =  ( a . priKeySize  /  32)  +  6; 

end  ; 

function  HexOutput  =  get . HexOut put ( a) 

HexOutput  =  re shape ( ( de c2hex ( a . S ,  2) ) 5 ,  1,  [] ) ; 

end 

function  IntKey  =  get . Int Key ( a) 

IntKey  =  hex2de c ( a . HexKey ) 5 ; 

end 

function  Intlnput  =  get . Int Input ( a) 

Intlnput  =  hex2dec ( a . Hexlnput ) ; 

end 
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203 

204 

205 

206 

207 

208 

209 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 

241 

242 

243 

244 


function  IntOutput  =  get . Int Out put ( a) 

IntOutput  =  reshape (a.S,  1,  [] ) ; 

end 

function  HexKey S chedul e  =  get . HexKeyS chedul e ( a) 

HexKeySchedule  =  [] ; 

for  i  =  1 :  a  .  Nr +  1 

HexKeySchedule  =  [HexKeySchedule;  reshape (dec2hex (reshape (a. w(4*(ifJ 
-l)+l:4*(i-l)+4,  :)  5  ,  1,  [] )  ,  2)  5  ,  1,  32)]; 

end 


end 

function  B inKeyS chedul e  =  get . B inKeyS chedul e ( a) 

BinKeySchedule  =  [] ; 

for  iRnd  =  l:a.Nr+l 

rnd_key  =  r e shape ( a . w (4* ( iRnd - 1 ) + 1 : 4* ( iRnd - 1 ) +4 , : ) 5 ,  1,  [] ) ; 

tmp  =  []  ; 

for  iByte  =  1:16 
if  iByte  >  1 

tmp  =  [tmp  5  5 ]  ; 

end 

tmp  =  [tmp  dec2bin ( rnd_key ( iByte ) ,  8)]; 

end 

BinKeySchedule  =  [BinKeySchedule;  tmp]; 

end 


end 


°/0  AESObject  constructor. 

function  a  =  AESOb j ect ( hexinput  ,  hexkey ,  mode) 

load  AESOb j ect _lookup_t ables ; 

a.InvSBox  =  InvSBox ;  clear  InvSBox ; 
a . PM  =  PM;  clear  PM; 
a.RCon  =  RCon ;  clear  RCon ; 
a.SBox  =  SBox;  clear  SBox ; 

a.ShiftRows  =  Shif tRows ;  clear  Shif tRows ; 
a . InvShif tRows  =  InvShif tRows ;  clear  InvShif tRows ; 
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245 

246 

247 

248 

249 

250 

251 

252 

253 

254 

255 

256 

257 

258 

259 

260 

261 

262 

263 

264 

265 

266 

267 

268 

269 

270 

271 

272 

273 

274 

275 

276 

277 

278 

279 

280 

281 

282 

283 

284 


a.P  =  P;  clear  P; 

a.RotWord  =  RotWord;  clear  RotWord; 

if  nargin 

a.HexInput  =  hexinput ; 
a.HexKey  =  hexkey ; 

end 

if  nargin  >  2 

a  .  Mode  =  mode ; 

else 

a.  Mode  =  a .  cENCRYPT_MODE  ;  °/0  Default  is  5  encrypt  5  mode 

end 


end  ; 

function  set . Hexinput (a ,  hexinput) 

if  iscellstr (hexinput ) 

if  (  length  ( hexinput )  ~=  16  ) 

error  (  5  Invalid  hexadecimal  input  representation.  Must  be  16  3 

bytes  5 )  ; 

end  ; 

a.HexInput  =  hexinput; 

else 

if  (  length ( hexinput )  ~=  32  ) 

error  (  5  Invalid  hexadecimal  input  representation.  Must  be  16 
bytes  5 )  ; 

end  ; 

for  idx  =  1 :(  length  (  hexinput  ) /2) 

a . Hexinput ( idx )  =  cellstr (hexinput (2* idx -1 : 2*  idx)  )  ; 

end 

end  ; 


end  ; 

function  set . HexKey ( a ,  hexkey) 
if  iscellstr (hexkey ) 

if  i sempt y ( f ind ( [ 16  24  32]  ==  1 ength ( hexkey ) ,  1)) 

error  (  5  Invalid  hexadecimal  key  representation.  Must  be  16,  3 

24,  or  32  bytes  long.5); 

end  ; 
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286 

287 

288 

289 

290 

291 

292 

293 

294 

295 

296 

297 

298 

299 

300 

301 

302 

303 

304 

305 

306 

307 

308 

309 

310 

311 

312 

313 

314 

315 

316 

317 

318 

319 

320 

321 

322 

323 

324 

325 

326 


a.HexKey  =  hexkey ; 


else 

if  i sempt y ( f ind  (  [32  48  64]  ==  1 ength ( hexkey )  ,  1)) 

error ( 5  Invalid  hexadecimal  key  representation.  Must  be  16, 
24,  or  32  bytes  long.5); 

end  ; 

for  idx  =  1 :( length ( hexkey ) /2) 

a  .  HexKey  (  idx  )  =  ce  11  st r  (  hexkey  ( 2*  idx  -  1  :  2*  idx  )  )  ; 

end 

end  ; 

a.priKeySize  =  length ( a . HexKey )  *  8; 

a . priKeyScheduleValid  =  0; 


end  ; 


function  mode  =  get.Mode(a) 


mode  =  a.priMode; 


end  ; 

function  set . Mode (a,  mode) 

if  (a. Mode  ==  a . cENCRYPT.MODE ) 
a.priMode  =  mode; 

elseif  (a. Mode  ==  a . cDECRYPT.MODE ) 
a.priMode  =  mode; 

else 

error  (  5  Inval  id  AES  Mode  Specified.  Must  be  0  or  l5); 

end  ; 

a.UpdateOutput ; 

end  ; 


11  Implement  the  AES  Key  Expansion  algorithm  per  FIPS  197. 
°/0  Verified  working  for  FIPS  197  test  keys  (all  sizes) 
function  keyschedule  =  get.w(a)  1 

if  a . priKeyScheduleValid 

keyschedule  =  a . pr iKeySchedule ; 
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328 

329 

330 
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332 

333 
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343 

344 
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347 
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351 

352 

353 

354 

355 

356 

357 

358 

359 

360 

361 

362 

363 

364 

365 

366 

367 


else 


keyschedule  =  zeros(4*(a.Nr+l),4);  °/0  44,  52,  or  60  rows  for 

128/192/256  bit 

key schedule ( 1 : a . Nk ,: )  =  (reshape  (a.IntKey,  4,  a.Nk))5; 

°/0  Pre-allocate  memory  to  store  intermediate  values  from  key  > 
scheduling 

if  ( a .  pr iKeySize  ==  128)  II  (a .  priKeySize  ==  192) 
a. IV. temp  =  zeros(70,4); 

else 

a. IV. temp  =  zeros(79,4); 

end 

i  =  a.Nk; 
hist_idx  =  1; 

while  i  <  (  a . Nb  *  (a. Nr  +  1)  ) 

temp  =  keyschedule  (i  ,:)  ;  °/0  Remember  ...  base  1  not  base  0 

a . IV . temp (hist_idx ,  :)  =  temp; 

hist_idx  =  hist_idx  +  1; 

if  (mod(i,  a.Nk)  ==  0) 

temp  =  t emp ( a . RotWord ) ; 
a . IV . temp (hist_idx ,  :)  =  temp; 

hist_idx  =  hist_idx  +  1; 

temp  =  a.SBox(temp  +  1); 
a . IV . t emp ( hist _idx ,  :)  =  temp; 

hist_idx  =  hist_idx  +  1; 

temp  =  bitxor(  temp,  a.RCon((i  /  a.Nk),  :  )  ); 

a . IV . temp (hist_idx ,  :)  =  temp; 

hist_idx  =  hist_idx  +  1; 

elseif  (a.Nk  >  6  &&  (mod(i,  a.Nk)  ==  4)) 

temp  =  a.SBox(temp  +  1); 
a . IV . temp (hist_idx ,  :)  =  temp; 

hist_idx  =  hist_idx  +  1; 

end  °/0  if 
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368 

369 

370 

371 

372 

373 

374 

375 

376 

377 

378 

379 

380 

381 

382 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 


keyschedule (i+1 , : )  =  bitxor (keyschedule ( ( i+1)  -  a . Nk , : ) ,  temp)^ 


i  =  i  +  1; 

end;  °/0  while 

a . pr iKeySchedule  =  keyschedule; 
a . priKeyScheduleValid  =  1; 

end;  °/0  if 

end 


end  ; 

end 


Listing  E.2  Encrypt. m 

function  out  =  Encrypt (a) 

Nr  =  a  .  Nr  ; 

Nb  =  a.Nb; 
w  =  a  .  w  ; 

IV_S  =  zeros (5+4*(Nr-l) ,4,4) ; 

MCI Vs  =  zeros (Nr-1 ,4 ,4 ,4 ,2)  ; 

S  =  reshape (a . Int Input ,  4,  4); 
a . S  =  S; 

IV_S  (1  ,  :  , : )  =  S; 

S  =  bitxor(S}  w(l:4,:)5);  °/0  AddRoundKey 
IV_S  (2  ,  :  , =  S; 

for  round  =  l:(Nr-l) 

S  =  a. SB ox  (S  +  1);  °/0  Sub  Bytes 

IV_S ( 3+4* ( round  - 1 )  ,  :  ,  :  )  =  S; 

S  =  S  (  a  .  Shif  tRows  )  ;  °/0  ShiftRows 

IV_S (4  +  4* ( round  - 1 )  ,  :  ,  :  )  =  S; 
a . S  =  S; 

a  .  MixColumns  ;  °/0  MixColumns 

°/0  MCIVs  (round  1)  =  a.MC_IV_l; 

%  MCIVs (round ,2)  =  a.MC_IV_2; 
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27 
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S  =  a  .  S  ; 

IV_S (5  +  4*(round-l)  ,  :  ,  :)  =  S; 

S  =  bitxor(S,  w  ((  1  +  4*  round  ):  (4  +  4*  round  ),:)  5  )  ;  °/0 

IV_S  (6  +  4*  (  round  -1)  ,  :  ,  :  )  =  S; 

end 

S  =  a.SBox(S  +  1);  °/0  SubBytes 

IV_S (3+4*(Nr-l) =  S; 

S  =  S ( a . Shif tRows ) ;  %  ShiftRows 
IV_S (4  +  4* ( Nr  - 1 )  ,  :  ,  :  )  =  S  ; 

S  =  bitxor(S,  w  (Nb*Nr  +  l :  Nb*Nr+4 ,  7.  AddRoundKey 

IV_S (5  +  4* ( Nr  - 1 )  ,  :  ,  :  )  =  S  ; 

a . S  =  S; 

out  =  a.HexOutput; 
a.IV.S  =  IV_S ; 
a.IV.MC  =  MCIVs ; 


end 


Listing  E.3  Decrypt. m 

function  out  =  Decrypt (a) 

°/0  Don’t  recalc  these  each  time...! 

Nr  =  a  .  Nr  ; 

Nb  =  a.Nb; 
w  =  a  .  w  ; 

IV_S  =  zeros (5+4*(Nr-l) ,4,4) ; 

°/0  Set  initial  state  to  input 
S  =  reshape (a . Intlnput ,  4,  4); 

IV_S  (1  ,  :  ,  :  )  =  S  ; 

S  =  bitxor(S,  w (Nb*Nr  +  l : Nb*Nr+4 ,  %  AddRoundKey 

IV_S  (2  ,  :  , :)  =  S; 


for  round  =  (Nr  -  1) : -1 : 1 
S  =  S ( a . InvShif tRows ) ; 

IV_S  (3  +  4* ( Nr  -  round  -  1),:,:)  =  S; 
S  =  a.InvSBox(S  +  1); 

IV_S  (4  +  4* ( Nr  -  round  -  1),:,:)  =  S; 


AddRoundKey 


°/0  InvShif  tRows 


°/0  InvSubBytes 
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23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 
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41 

42 

1 
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S 


bitxor  (S  ,  w ( (1  +  4* round)  :  (4  +  4* round)  ,  :  )  5 )  ; 


°/0  AddRoundKey 


I V_S ( 5+4* ( Nr  -  round  -  1),:,:)  =  S; 
a . S  =  S; 

a.  InvMixColumns  ; 

S  =  a  .  S  ; 

IV_S (6+4* ( Nr  -  round  -  1),:,:)  =  S; 

end 

S  =  S ( a . InvShif tRows )  ; 

IV_S (3  +  4* ( Nr  - 1 )  ,  :  ,  :  )  =  S  ; 

S  =  a.InvSBox(S  +  1); 

IV_S (4  +  4* ( Nr  - 1 )  ,  :  ,  :  )  =  S  ; 

S  =  bitxor(S,  w(l:4,:)5); 

IV_S (5  +  4*(Nr-l)  =  S; 

a  .  S  =  S; 

out  =  a.HexOutput; 
a.IV.S  =  IV_S ; 


°/0  InvMixColumns 


°/0  InvShiftRows 

°/0  InvSubBytes 
°/0  AddRoundKey 


end 

Listing  E.4  MixColumns.m 

7. 

function  MixColumns ( a) 

°/0  Commented  out  for  efficiency.  This  version  is  SLOW!  Using  table  lookup 
°/0  of  polymul  instead. 

7. 

70  function  out  =  polymul  (x,y) 

7o 

7o  [~  ,  r]  =  de conv ( conv (x  ,  y)  ,  a.P); 

7o  tmp  =  mod  (r  ,  2)  ; 

7. 

70  7o  Binary  to  decimal  c onver s i on  .  .  .  much  faster  than  bi2de  ! 

7o  out  =  [128  64  32  16  8  4  2  1]  *  tmp  ( end -7 :  end )  5  ; 

% 

7o  end 

S  =  a  .  S  ; 

for  col  =  1:4 
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sO  =  S(1 , col) 
si  =  S (2 , col  ) 
s2  =  S (3 , col  ) 
s3  =  S (4 , col  ) 


S(l,  col) 
)) ; 

S (2 ,  col) 
)) ; 

S (3 ,  col) 
)) ; 

S (4 ,  col) 
)) ; 


bitxor ( bit xor ( a . PM  (  2  +  1 ,  sO  +  1),  a.PM(3  +  l,  sl  +  1)),  bitxor(s2,  s3^ 

bitxor ( bit xor ( sO ,  a.PM(2+l,  sl  +  1)),  bitxor ( a . PM (3  +  1 ,  s2  +  l)  ,  s3^ 

bitxor ( bit xor ( sO ,  si),  bitxor (a . PM (2+1 ,  s2+l) ,  a.PM(3+l,  s3+l)^ 

bitxor ( bitxor ( a . PM (3+1 ,  sO+1),  si),  bitxor(s2,  a.PM(2+l,  s3+l)^ 


7o 

7. 


7. 


7o 


7. 


This  version  is  SLOW! 

a . S ( 1  ,  col)  = 
bitxor  (  s2  ,  s3 )  )  ; 

a . S (2 ,  col)  = 


1] , 

s2b )  ,  s3 )  )  ; 

a . S (3  ,  col  ) 

([1 

1]  ,  s3b )  )  )  ; 

a . S (4  ,  col) 

([1 

0]  ,  s3b ) )  )  ; 

end 

a  .  S 

=  S; 

bitxor  ( bitxor  ( polymul  (  [1  0],  sOb),  polymul  (  [1  1],  sib)), 

bit  xor  (  bitxor  (  sO  ,  polymul  (  [1  0],  sib)),  bitxor  (polymul  (  [1 
bit xor ( bitxor  (  sO  ,  si),  bit xor ( polymul  ([  1  0],  s2b),  polymul^ 

bitxor ( bitxor ( polymul  (  [1  1],  sOb),  si),  bitxor(s2,  polymul^ 


end 


Listing  E.5  InvMixColumns.m 

function  InvMixColumns ( a) 


7o  Commented  out  for  efficiency.  This  version  is  SLOW!  Using  table  lookup 
70  of  polymul  instead 

7. 

7o  function  out  =  polymul  (x,y) 

7. 

70  gfx  =  de2bi(x,  ’ left-msb5); 

70  gfy  =  de2bi(y,  5left-msb5); 

7. 

7.  [~  ,  r]  =  de  conv  (  conv  ( gf  x  ,  gfy),  a.P); 

7o  tmp  =  mod  (r  ,  2)  ; 
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7. 

70  out  =  bi2de(tmp,  5  lef t -msb  5 )  ; 

7. 

7o  end 


S  =  a  .  S  ; 

for  col  =  1:4 

sO  =  S(1 , col)  ; 
si  =  S (2 , col  )  ; 
s2  =  S (3 , col  )  ; 
s3  =  S (4 , col  )  ; 

S(l,  col)  =  bitxor (bitxor (a . PM (14+1 , sO  +  1)  ,  a . PM ( 1 1  +  1 , si +  1) )  ,  .. 

bit xor (a. PM (13+1 , s2+l) ,  a . PM (9+1 , s3 +1 ) ) ) ; 
S(2,  col)  =  bitxor (bitxor (a . PM (9+1 , sO+1) ,  a . PM ( 14+1 , si +1 ) ) , 

bitxor (a. PM (11+1 , s2+l) ,  a. PM (13+1 , s3+l) ) ) ; 
S (3 ,  col)  =  bitxor (bitxor (a. PM (13+1 , sO+1) ,  a . PM (9+1 , si +1 ) ) , 

bitxor (a. PM (14+1 , s2+l) ,  a.PM(ll+l,s3+l))); 
S(4,  col)  =  bitxor (bitxor (a . PM (11  +  1 , sO  +  1)  ,  a . PM  (13+1 , sl  +  1) )  ,  .. 

bitxor (a. PM (9+1 , s2+l) ,  a. PM (14+1 , s3+l) ) ) ; 


7o  This  version  is  SLOW! 

7. 

bitxor (bitxor ( polymul (  14  ,  sO  )  ,  polymul ( 1 1 , s 1 ) )  , 
bitxor (polymul ( 13 , s2 )  ,  polymul (9 , s3 )  )  )  ; 
bitxor (bitxor (polymul  (9 , sO )  ,  polymul ( 14 , s 1 )  )  , 

bitxor (polymul ( 1 1 , s2 )  ,  polymul  (13, s3)))  ; 
bitxor (bitxor (polymul  (  13  ,  sO)  ,  polymul (9 , si ) )  , 

bitxor (polymul ( 14 , s2 )  ,  polymul  (11, s3))); 
bitxor (bitxor (polymul ( 1 1 , sO )  ,  polymul ( 13 , s 1 ) )  , 
bitxor (polymul (9 , s2 )  ,  polymul (14, s3))); 

end 

a . S  =  S; 


7o  a  .  S  ( 1  ,  col)  = 

7. 

7o  a  .  S  ( 2  ,  col)  = 

7. 

7,  a  .  S  (3  ,  col)  = 

7. 

7,  a  .  S  (4  ,  col)  = 

% 


end 
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Appendix  F.  MemMapTRS  Source  Code 

This  appendix  contains  the  source  code  for  to  wrap  the  Riscure  Inspector  ‘.trs’ 
format  with  a  Matlab  memory-mapped  interface.  This  allows  the  very  large  data 
files  to  be  manipulated  from  disk  in  a  manner  similar  to  native  Matlab  matrices. 
This  code  is  used  extensively  by  the  RF  DNA  Fingerprinting  and  Leakage  Mapping 
procedures.  See  the  Matlab  help  files  for  general  information  on  memory-mapping. 

Listing  F.l  MemMapTRS. m 

°/o°/o  Class  to  access  to  Riscure  5  s  .trs  format  via  Matlab  memmapfile 

7. 

70  Synt  ax  : 

7. 

7.  memmap  =  MemMapTRS  ( TracePath  ,  TRSFileName) 

70  new_memmap  =  MemMapTRS ( TracePath ,  TRSFileName,  TraceDataMat  ,  ... 

7o  PTCTDat aMat  ,  XDelta) 

7. 

7.  First  two  arguments  (directory  &  filename)  are  mandatory.  Others 

7.  are  only  needed  if  creating  a  new  .trs  file  from  an  existing  matrix 

7o  and  data.  To  create  a  new  .trs,  use  the  second  syntax  along  with 

7.  an  (N_t  X  N_s)  matrix  containing  the  trace  data.  N_t  is  the  number 

70  of  traces  (rows);  N_s  is  the  number  of  samples  per  trace  (columns). 

7.  XDelta  is  the  time  per  sample  or  1/ SampleRat e  ,  used  for  scaling 

70  X-Axis  . 

7. 

7.  This  class  allows  direct  access  to  the  Riscure  5  .  trs  5  trace  set  file 
70  format  using  Matlab  5  s  memmapfile  capability.  Allows  efficient 
7o  manipulation  of  -very-  large  data  sets. 

7o 

7.  Current  weakness  is  the  inability  to  fully  treat  it  like  a  matrix. 

7.  It  5  s  not  currently  possible  to  index  across  specific  columns  and 
7.  multiple  rows  of  the  traceset. 

7. 

7o  !  *T0D0*  !  --  Look  into  using  Riscure  5  s  Java  API  to  improve  this  aspect 

7o 

7#  Author  :  Maj  Will  Cobb 
7.  Created:  28  Feb  2010 
7.  Last  Modified:  28  Jul  2010 
7.  31  Aug  2010 

7o 


-  added  writeable  flag  to  allow  file 

-  added  capability  to  create  a  .trs  file  from 
a  matrix  and  associated  PT/CT/KY  data 
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°/o 

°/o 


9  Sep  2010 


updated  comments  to  better  document 
functionality  . 


classdef  MemMapTRS  <  handle 


erties  (Constant)  °/ 

Constants  used 

in  . 

trs  header 

struct 

ur  e 

NT 

=  hex2de  c ( 

41’)  ; 

°/0  Number  of  tr 

aces 

NS 

=  hex2de  c ( 

42’)  ; 

°/0  Number  of  samples 

per  trace 

SC 

=  hex2de  c ( 5 

43’)  ; 

°/0  Sample  coding:  5 

000ABBBB  5 

where  A 

indicates  ^ 

integer  (0) 

or  f] 

.oating  point  (1) 

;  B  i 

s  length  in  bytes 

(must  be  in 

{1,2,4}) 

DS 

=  hex2de  c ( 5 

44’)  ; 

°/0  Length  of  cr 

yptographic  /  s 

upplementary  data 

TS 

=  hex2de  c ( 5 

45’)  ; 

°/0  Title  space 

reser 

ved  for  each  trac 

e 

GT 

=  hex2de  c ( 5 

46’)  ; 

°/0  Global  trace 

t  itl 

e 

DC 

=  hex2de  c ( 5 

47’)  ; 

°/0  Description 

XO 

=  hex2de  c ( 5 

48  ’ )  ; 

°/0  Offset  in  X- 

axis 

for  trace 

represe 

ntat ion 

XL 

=  hex2de  c ( 5 

49  ’ )  ; 

°/0  Label  of  X-axis 

YL 

=  hex2de  c ( 5 

4A  ’ )  ; 

°/0  Label  of  Y-axis 

XS 

=  hex2de  c ( 5 

4B  ’ )  ; 

°/0  Scale  value 

for  X 

-  axis 

YS 

=  hex2de  c ( 5 

4C  ’ )  ; 

°/0  Scale  value 

for  Y 

-  axis 

TO 

=  hex2de  c ( 5 

4D  ’ )  ; 

°/0  Trace  offset 

for 

displaying 

trace 

numbers 

LS 

=  hex2de  c ( 5 

4E  ’  )  ; 

°/0  Logarithmic 

scale 

TB 

=  hex2de  c ( 5 

5F  ’  )  ; 

°/0  Trace  block 

marke 

r;  5  5f  005 

marks 

end  of  header 

end;  °/0  properties  (Constant) 

properties  (Access  =  public) 
NumTraces  =  0; 

NumSamples  =  0; 


SampleCoding  = 

0; 

INTEGER .CODING 

=  l; 

°/0  Default 

is  int8 

coding 

FLOAT.CODING  = 

0; 

SampleLength  = 

1; 

°/0  Default 

is  int8 

coding 

DataLength  =  0; 
TitleLength  =  0; 
GlobalTit le  =  5  5  ; 
Description  =  5  5  ; 

XOffset  =  0; 
XLabel  =  5  5  ; 
YLabel  =  5  5  ; 
XDelta  =  0; 

YDelta  =  0; 
TraceOffset  =  0; 
LogScaleFlag  =  0; 
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coding_type  =  0; 
data.f ormat  =  {}; 


memmap ; 

end;  7.  public  properties 

properties  (Access  =  private) 
idx  =  0; 

end;  7.  private  properies 

methods 

7.^ 

7.7. °/. 

7.7.7. 

■/.«-> 

function  m  =  MemMapTRS ( TracePath ,  TRSName ,  TraceDataMat,  PTCTDat aMat ,  3 

XDelta) 

filepath  =  fullf i le ( TracePath ,  TRSName); 

if  ~ exi st  (  f ilepath  ,  ’file5) 
if  nargin  <  3 

error  (  5  Specif  ied  file:\n\n  7.s\n\n  does  not  exist!’,  filepath); 

end 

m  =  Creat eTRSFile (m ,  filepath,  TraceDataMat,  PTCTDataMat ,  XDelta); 

end  ; 

m  =  Par seHeader (m ,  filepath); 
m  =  MakeFormat (m) ; 

7.  Call  memmap  constructor 

m. memmap  =  memmapf ile ( f ilepath  ,  ... 

’offset’,  m. idx,  ...  %  This  offset  «—» 

is  0  based  ! ! 

’format’,  m . dat a_f ormat ,  ... 

’repeat’,  m.NumTraces,  ... 

’Writable’,  true); 


7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 

Constructor  --  pass  in  directory  (TracePath)  and  name  of 
TRS  file  to  be  memory  mapped 


7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.  ^ 
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124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145  °/0 

146  °/0 

147  °/0 

148  °/0 

149  °/0 

150  °/0 

151  °/0 

152  °/0 


end;  °/0  MemMapTRS  method 

function  m  =  ParseHeader  (m,  filepath) 

°/o°/o  Open  the  file  and  parse  the  header 
try 

fid  =  f open  (  f ilepath  ,  5r5); 

data_loc_f ound  =  0; 
m  .  idx  =  0 ; 

while  ( dat a_loc_f ound  ==  0) 

tmp  =  fread(fid,  1,  5int85); 

if  (tmp  ==  m  .  NT ) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.NumTraces  =  fread(fid,  1,  5 int32 5 ,  515); 

m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.NS) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.NumSamples  =  fread(fid,  1,  5 int32 5 ,  515) 

m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.SC) 

trash  =  fread (fid ,  1,  5 int8 5 ) ; 

m . SampleCoding  =  fread(fid,  1,  5 int8 5 ) ; 

if  ( bitget (m . SampleCoding  ,  5)  ==  0) 

m. INTEGER_CODING  =  1; 
m. FLOAT.CODING  =  0; 

else 

m. INTEGER _C0DING  =  0; 
m. FLOAT.CODING  =  1; 

end  ; 

m . SampleLength  =  bit and (m . SampleCoding  ,  7) 
switch  bit and ( SampleCoding  ,  7) 
case  1 

SampleLength  =  1; 
case  2 

SampleLength  =  2; 
case  4 

SampleLength  =  4; 
otherwise 
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display ( 5  ERROR :  INVALID  SAMPLE  CODING  DETECTED^- 


break ; 

end  ; 

0/0m.idx  =  m.idx  +  2; 

m.idx  =  m.idx  +  3;  °/0  I  think  2  was  WRONG...??  should  be 

byte  for  code  ,  1  byte  for  length  ,  1  byte  for  sample 

coding 

elseif  (tmp  ==  m.DS) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.DataLength  =  fread(fid,  1,  5intl65); 

m  .  idx  =  m  .  idx  +  4  ; 
elseif  (tmp  ==  m.TS) 

°/0  IMPORTANT:  THIS  WON’T  CORRECTLY  HANDLE  TITLE 
°/0  LONGER  THAN  127  CHARACTERS  (YET)  !  !  !  !  ! 
trash  =  fread(fid,  1,  5 int8 5 ) ; 

m . TitleLength  =  fread(fid,  1,  5 int8 5 ) ; 

m  .  idx  =  m  .  idx  +  3  ; 
elseif  (tmp  ==  m . GT ) 

length  =  fread(fid,  1,  5 int8 5 ) ; 

m . GlobalTi t le  =  char ( f read ( f id ,  length,  5 int8 5 ) ) ; 
m.idx  =  m.idx  +  length  +  2; 
elseif  (tmp  ==  m.DC) 

length  =  fread(fid,  1,  5uint85); 

m . Descript  ion  =  char ( f read ( f id ,  length,  5 int8 5 ) )  ; 
m.idx  =  m.idx  +  length  +  2; 
elseif  (tmp  ==  m.XO) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.XOffset  =  fread(fid,  1,  ’ int32 5 ) ; 

m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.XL) 

length  =  fread(fid,  1,  5 int8 5 ) ; 

m.XLabel  =  char ( f read ( f id ,  length,  5 int8 5 ) ) ; 
m.idx  =  m.idx  +  length  +  2; 
elseif  (tmp  ==  m.YL) 

length  =  fread(fid,  1,  5 int8 5 ) ; 

m.YLabel  =  char ( f read ( f id ,  length,  5 int8 5 ) ) ; 
m.idx  =  m.idx  +  length  +  2; 
elseif  (tmp  ==  m.XS) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.XDelta  =  fread(fid,  1,  5float325,  515); 
m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.YS) 
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trash  =  fread(fid,  1,  5 int8 5 ) ; 

m.YDelta  =  fread(fid,  1,  ’float32’,  ’1’); 
m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.TO) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m . TraceOf f set  =  fread(fid,  1,  ’int32’); 

m  .  idx  =  m  .  idx  +  6  ; 
elseif  (tmp  ==  m.LS) 

trash  =  fread(fid,  1,  5 int8 5 ) ; 

m . LogScaleFlag  =  fread(fid,  1,  5 int8 5 ) ; 

m  .  idx  =  m  .  idx  +  3  ; 
elseif  (tmp  ==  m.TB) 

data_loc_f ound  =  1; 

m.idx  =  m.idx  +  2;  °/0  ***  Not  sure  why  this  is  7,  but 

trial  Sc  error  found  this  works  .  .  . 

else 

m  .  idx  =  m  .  idx  +  1 ; 

end  ; 


end  ; 

catch  SomeExcept ion 

SomeExcept ion  °/0  Display  any  error  information 

error  (  ’Error  parsing  .  trs  file  header  for  file:  °/0s’,  filepath); 


end  ; 

°/0  Done  parsing  header  ...  close  the  file  and  prepare  to  map  to  memory 
f close (fid)  ; 

end;  °/0  ParseHeader  method 

function  m  =  Creat eTRSFile (m ,  filepath,  trace.data ,  PTCT_data ,  XDelta) 

m.NumTraces  =  size (trace.data ,  1); 

m.NumSamples  =  size ( trace_dat a ,  2); 

m  .  SampleCoding  =  bin2dec  (  5  00010100  5  )  ;  °/0  Floating  point  (bit  5  =  5  1  5  )  , 

4  bytes/sample  (bits  3..0  =  ’4’) 

m.DataLength  =  48; 
m . Tit leLength  =  0; 

m . GlobalTit le  =  ’Global  Title  Goes  Here5; 
m . De s cr ipt ion  =  ’Description  Goes  Here’; 
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252 

253 

254 

255 

256 
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258 
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m . XOf f set  =  0 ; 
m.XLabel  =  ’Seconds 
m.XDelta  =  XDelta; 
m.YLabel  =  ’Volts’; 
m . YDelt a  =  1 ; 


m . TraceOf f  set  = 

0;  1  What  is  this? 

m . LogScaleFlag  = 

=  0; 

m. FLOAT_CODING  = 

=  l ; 

m . SampleLength  = 

=  4; 

try 

11  Open  the  file  and  write  the  header 
fid  =  f open  (  f ilepath  ,  ’w’); 


f write (f id  , 

m  .  NT ,  ’ int8  ’  )  ; 

f wr it e (fid  , 

00 

-p 

a 

•H 

f write (f id  , 

m.NumTraces,  ’ int32 ’  ,  ’1’); 

f wr it e (fid  , 

m  .  NS  ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

4  ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

m . NumSamples  ,  ’ int32 ’  ,  ’1’); 

f wr it e (fid  , 

m  .  SC  ,  ’ int 8  ’  )  ; 

f write (f id  , 

1 ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

m . SampleCoding ,  ’uint8’); 

f wr it e (fid  , 

m . DS ,  ’ int 8 ’ ) ; 

f wr it e (fid  , 

2,  ’ int 8  ’  )  ; 

f write (f id  , 

m . Dat aLength ,  ’intl6’); 

f write (f id  , 

m  .  TS  ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

1  ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

m . Tit leLength  ,  ’ int8  ’  )  ; 

f wr it e (fid  , 

m  .  GT  ,  ’ int 8  ’  )  ; 

f write (f id  , 

length (m . GlobalTitle ) ,  ’ int8 ’ ) 

f wr it e (fid  , 

uint8 (m . GlobalTit le ) ,  ’uint8’) 

f wr it e (fid  , 

m  .  DC  ,  ’ int 8  ’  )  ; 

f wr it e (fid  , 

length (m . Descript  ion)  ,  ’ int8 ’ ) 

f write (f id  , 

uint8 (m . Descript  ion)  ,  ’uint8’) 

186 


277 

278 

279 

280 

281 

282 

283 

284 

285 

286 

287 

288 

289 

290 

291 

292 

293 

294 
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fwrite  (fid  , 

fwrite (fid , 

fwrite (fid  , 

m  .  XO ,  5 int 8  5  )  ; 

4,  5 int8  5  )  ; 

0,  5 int32  5  ,  5  1  5  ) 

; 

fwrite (fid  , 

fwrite (fid  , 

fwrite  (fid  , 

m  .  XL  ,  5 int 8  5  )  ; 

length (m. XLabel) 

uint 8 (m . XLabel )  , 

,  5 int 8  5 )  ; 

5  uint 8 5  )  ; 

fwrite (fid , 

fwrite (fid  , 

fwrite (fid  , 

m  .  YL  ,  5 int 8  5  )  ; 

length (m . YLabel ) 

uint 8 (m . YLabel )  , 

,  5 int 8  5 )  ; 

5  uint 8  5  )  ; 

fwrite  (fid  , 

fwrite (fid , 

fwrite  (fid  , 

m  .  XS  ,  5 int 8  5  )  ; 

4,  5 int8  5  )  ; 

m.XDelta,  5float325); 

fwrite (fid , 

fwrite (fid  , 

fwrite (fid  , 

m.YS,  5 int8  5 )  ; 

4,  5 int8  5 )  ; 

m.YDelta,  5float325); 

fwrite  (fid  , 

fwrite (fid , 

fwrite (fid  , 

m  .  TO ,  5 int 8  5  )  ; 

4,  5 int8  5  )  ; 

m . TraceOf f  set  ,  5 

int32  5  ,  5  1 

fwrite (fid  , 

fwrite  (fid  , 

fwrite (fid , 

m  .  LS  ,  5 int 8  5  )  ; 

1  ,  5 int 8 5  )  ; 

m.LogScaleFlag  , 

5  uint 8 5 )  ; 

fwrite (fid  , 

m  .  TB  ,  5 int 8  5  )  ; 

fwrite(fid,  0,  5 int8 5 ) ; 

7.  Write  each  row  with  PTCTData.... 
for  trace_idx  =  1 : size ( trace_dat a ,  1); 

fwrite(fid,  PTCT_dat a ( trace_idx  ,  :  )  ,  5uint85); 

fwrite (fid ,  trace.dat a ( trace_idx , : ) ,  ’single5); 

end 

°/0  Done  writing  file  .  .  .  close 
f close (fid)  ; 

catch  SomeExcept ion 

SomeExcept ion  °/0  Display  any  error  information 
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error(’Error  creating  file  header  for  specified  file:  70s5,  3 

f ilepath ) ; 


end  ; 


end;  %  CreateHeader  method 


%  After  reading  the  appropriate  parameters  from  the  file  header , 

7o  construct  the  format  that  will  be  used  to  map  the  file  to  memory- 
function  m  =  MakeFormat  (m) 

if  (m. FLOAT.CODING  ==  1) 

m . coding_type  =  ’single5; 
elseif  m . SampleLength  ==  1 
m . coding_type  =  5 int8 5 ; 

elseif  m . SampleLength  ==  2 

m . coding_type  =  5intl65; 
elseif  m . SampleLength  ==  4 
m . coding_type  =  5 int32 5 ; 


else 


error ( ’ERROR:  INVALID  CODING  TYPE  DETECTED’); 


end  ; 


70  Four  possible  combinations  for  actual  trace  data  block  ...  must  have 
actual 

7o  tracedata  ,  but  Title  and  Data  are  optional, 
if  ( (m . Dat aLength  >  0)  &&  (m . Tit leLength  >  0)) 

m . dat a_f ormat  =  {  ’uint8’  ... 


[1  m . Tit leLength] 

’Title ’ ; 

uint8  ’ 

[1  m . Dat aLength] 

’ PTCTData ’ ; 

sprintf (m. coding_type) 

[1  m . NumSamples] 

’ tr acedat  a  ’  } 

elseif  (m . Dat aLength  >  0) 


m . dat a_f ormat  =  {  ’uint8’ 

[1  m . Dat aLength]  ’PTCTData’;  ... 

spr intf (m . coding_type )  ... 

[1  m . NumSamples]  ’tracedata’}; 
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elseif  (m . Ti t leLength  >  0) 


m. data.f ormat  =  {  5uint85 

[1  m . Tit leLength]  ’Title5; 
sprintf (m. coding_type) 

[1  m . NumSamples]  5 tr acedat a  5 } ; 

else  °/0  No  title  or  data  areas  allocated... 

m . dat a_f ormat  =  {  spr int f (m . coding_type ) 

[1  m . NumSamples]  5 tr acedat a  5 } ; 

end  ; 

end;  °/0  MakeFormat  method 

end;  °/0  Methods 

end 
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